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PREFACE 

The Seventh Copper Mountain Conference on Multigrid Methods was held on April 
2-7, 1995, at Copper Mountain, Colorado, and was sponsored by NASA and the 
Department of Energy. The University of Colorado, Front Range Scientific Computations, 
Inc., and the Society for industrial and Applied Mathematics provided organizational 
support for the conference. 

This document is a collection of many of the papers that were presented at the con- 
ference and thus represents the conference proceedings. NASA Langley has graciously 
provided printing of this book so that all of the papers could be presented in a single 
forum. Each paper was reviewed by a member of the conference organizing committee 
under the coordination of the editors. 

The multigrid discipline continues to expand and mature, as is evident from these 
proceedings. The vibrancy and diversity in this field are amply expressed in these 
important papers, and the collection clearly shows the continuing rapid growth of the 
use of multigrid acceleration techniques. 
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MULTIGRID HISTORY 

(At the awards ceremony of the conference, Achi Brandt presented the following 
history of multigrid. The reader should study the truths contained herein and revel in 
the humor.) 

The early history of multigrid has recently become a hot subject of research. An 
ancient multigrid code was uncovered during extensive excavations last year in northern 
Turkestan. Carbon tests indicate that this code has an efficiency of 5.1 on the Richter 
scale. Some researchers believe that the V cycle was practiced by the Neanderthals. 
The use of the Full Multigrid (FMG) algorithm was, however, unique to Homo sapiens and 
is one of the major reasons for their ultimate survival. Prototypes of two-grid algorithms 
predate the first hominids. Most historians agree that coarsening was, in fact, invented 
by the dinosaurs; however, coarse-to-fine grid transfers were unknown to them, which 
explains their extinction. 

Earlier geological findings include rich multilevel deposits that have been unearthed 
in several North American gold mines, and thick layers of old multigridders have been 
discovered at Copper Mountain. 

The artifacts at the northern Turkestan site indicate that an early form of residual 
weighting was already in widespread use before the middle Full Approximation Storage 
(FAS) period. When Copernicus first introduced line relaxation, it was banned by the 
Catholic church. Pope Pointus the Square decreed that mere mortals should not practice 
such nonlocal schemes. He feared this practice would lead humanity to incompleteness, 
in particular to the incomplete LU decomposition of the Dutch church. The advent of 
variational coarsening during the French Revolution marks the dawn of the modern era, 
which is quite familiar to us all. 
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A PRESSURE BASED MULTIGRID PROCEDURE FOR THE 
NAVIER-STOKES EQUATIONS ON UNSTRUCTURED GRIDS 

R. Jyotsna and S. P. Vanka 
Department of Mechanical and Industrial Engineering 
University of Illinois at Urbana-Champaign, Urbana, IL. 61801 

ABSTRACT 

We present details and performance of a pressure based multigrid solution procedure for the 
Navier-Stokes equations discretized on triangular grids. The discretization uses a control volume 
methodology, with linear inter-nodal variation of the flow variables. The use of the multigrid 
technique provides rapid and grid-independent rates of convergence. Three model driven cavity 
flows are computed, and the performance of the method at several grid densities and Reynolds 
numbers is reported. Representative flow fields characterizing the viscous eddies are also 
presented. 

1. INTRODUCTION 

The multigrid technique [1] provides an efficient means of smoothing high and low 
frequency errors that arise during the iterative solution of elliptic equations. Multigrid acceleration 
of solution procedures on unstructured meshes has been demonstrated earlier for single elliptic 
equations [2,3], for Euler equations [4-7], and for the compressible Navier-Stokes equations [8]. 
These procedures have used complete remeshing to generate a sequence of independent coarse and 
fine grids. Because of the independence of the grids, inter-grid transfers are somewhat 
complicated. Another strategy to coarsen a given fine grid is 'volume agglomeration’, where the 
fine grid control volumes are progressively combined to obtain coarser control volumes. The 
resulting coarse grid volumes in this procedure do not have the same shapes as those of the finest 
grid, thus requiring special practices for constructing the discrete operators. The volume 
agglomeration technique is reviewed in reference [6]. 

The present paper describes a pressure based multigrid calculation procedure for unstructured 
grids. The discretization scheme is based on a control volume integration of the governing 
equations analogous to the practices followed in references [9-12]. On any given grid, the solution 
procedure employs a decoupled relaxation in conjunction with a pressure equation obtained 
through combination of the continuity and momentum equations in a special way [10]. In contrast 
with the coupled multigrid procedure followed in Vanka [13], and recently in Webster [14], the 
decoupled solution procedure is simpler to implement, and is better suited for use with a variety of 
linear solvers. In this paper, we discuss the details of the multigrid implementation, and its 
performance in three model driven-cavity flows. We have considered as examples, flows in a 
square cavity, a triangular cavity, and a semicircular cavity. The flow domain is discretized by 
Delaunay triangulation [15], with the fine grid obtained by uniform refinement of each triangle. In 
the following sections, we first describe the single grid procedure and its performance at increasing 
refinements of the mesh. Next, we describe the details of the components of the multigrid 
procedure (coarse grid equations, restriction, prolongation). The performance of the procedure in 
the three configurations at increasing Reynolds numbers is next presented along with brief 
descriptions of the flow fields. 

2. GOVERNING EQUATIONS AND DISCRETIZATION PROCEDURE 

Currently, we consider only the Navier-Stokes equations governing a two-dimensional, 
steady, incompressible flow of constant fluid properties. Thus the equations that are solved can be 
written in primitive variables (u, v, p) as 
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V • (u u ) = - (3p / 9x ) + 

vV*(Vu) + B u 

(1) 

V • (u v ) = - (dp / dy ) + 

v V • ( Vv) + By 

(2) 

V • u = 0 


(3) 


Here u and v are the two components of the velocity vector u, and p is the pressure divided by the 
density; v is the kinematic viscosity, and B u and B v provide a means to include other forces such 
as those due to gravity and rotation. 

The above equations are discretized on a triangular mesh shown in Figure 1(a). We use a 
control volume procedure essentially the same as that described in Prakash and Patankar [10], 
except that we have preferred to retain the central differencing scheme. In Prakash and Patankar 
[10] and related works, an exponential variation was introduced for stability at high cell Peclet 
numbers. Such a differencing scheme, although it provides stability, reduces the accuracy to first 
order, and is not satisfactory. Currently we have refined the finest mesh, until the cell Peclet 
number decreases below the stable value. Thus for a given grid, there exists a maximum flow 
Reynolds number that cannot be exceeded. 

Figure 1(a) shows the control volume constructed around a representative node P, by joining 
the centroids of the relevant triangles to the midpoints of the sides. The equations are integrated 
over each of these control volumes to obtain nodal values of pressure and velocity. The checker- 
board split in the pressure field that arises in such equal-order interpolation is avoided, by requiring 
a different set of velocities (u, v), located at the cell interfaces, to satisfy mass continuity. This 
practice is similar to the momentum interpolation concept used in collocated finite volume schemes 
[16-18], 

The Momentum Balances 

Integrating equation (1) over the discrete control volume ABCDEF and using the divergence 
theorem, we have 

s J[(uu - vVu)-n]dS = V J(B U - |^)dV (4) 

where S is the enclosing surface of control volume V. 

Consider now element PAB (Figure 1(b)), which has two faces ajc and ca 3 bounding the 

control volume around P. The contributions from these two surfaces to the flux balance can be 
written as 


c a 3 

a] I ( J u * n ) dS +C J ( J u • n ) dS - p^S ( B u - £ ) dV (5) 

where J u = u u - v V u 

To compute the flux J u , we use a linear interpolation of velocities between the nodes of PAB. 

Pressure is also assumed to vary linearly. Further, it is convenient to integrate the flux terms in 
local coordinates (X, Y), defined with the origin at the centroid of the element. The components of 
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J u are then expressed in terms of the nodal values of u because of the linear interpolation used. 

Using Simpson's rule to evaluate the integrals, it can be shown that after collecting like terms and 
simplifying the complete equation, the resulting equation has the form 

A p up = lA nb u nb - V p < B u - |£>p (6) 

where up is the value of u at point P and u nb represents values at the neighboring nodes A, B, C, 
D, E and F. Vp is the area of the control volume around P, and < > is an average defined by 

<B> = (lAfp)I e [(A i /3)B i ] (7) 

where Aj is the area of element i around P, and X e denotes summation over all the elements 
contributing to Vp. The expressions for the coefficients are not provided here, but can be derived 

by the above mentioned steps. Following the same procedure for equation (2), we can obtain the 
discretized y-momentum balance as 


Ap v p = lA nb v nb - Vp < B v - ^> P (8) 

It is convenient to define momentum velocities u and v as 

u = ( XA nb u nb ) / Ap, v = ( XA nb v nb ) / Ap (9) 

so that 

u = u + V p < B u - > / Ap and v = v + Vp< B y - |^> / Ap (10) 

The Continuity Equation 

In the present procedure, u and v located at the nodal points do not satisfy the continuity 
equation. Rather, the cell face fluxes are balanced for each control volume. These cell face fluxes 
are interpolants of the nodal values in a special way that preserves the connections between the 
nodal pressures. The practice is similar to the momentum interpolation scheme used in finite 
volume schemes with a collocated arrangement of velocities and pressure [16-18]. 

We define a new set of velocities u and v , located at the interfaces, and related to u and v 
by 


u =• u + D (B n - ^ ) and v = v + D(B V -^) (11) 

u ox v oy 

where D = Vp / Ap. The pressure gradients in equations (11) are evaluated locally for each 
element. The discrete continuity equation is obtained from 

V • u = 0 (3) 

written as 

S J ( u • n ) dS =0 (12) 
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The values of D at points within the element are linearly interpolated from the nodal values. The 
pressure gradients (8p/8x) and (8p/3y) are now local at the cell faces, and can be related to the 

nodal pressures ( pp, p^, pg ) because of the linear interpolation used. If the equations for ~ and 

are substituted in the two interface flux relations, the contributions from element PAB to the 
3y 

continuity at node P are obtained. Similar contributions from all elements surrounding P 
then provide a pressure equation at P given by 

A P PP P = 2A p nb p nb + M p (13) 

where Mp is the source term arising from the terms containing u, v and B u , B v . We now 
seek a solution (u, v, p) that satisfies the set of discrete equations (6), (8) and (13). 

3. SINGLE GRID SOLUTION STRATEGY AND PERFORMANCE 

The system of coupled equations (6), (8) and (13) has been previously solved by a sequential 
solution method, SIMPLER [19]. The iterative update involves solving in a cycle the pressure 
equation, followed by the two momentum equations. Starting from guessed velocity and pressure 
fields, the coefficients Ap and A nb are first assembled. Using these, the pressure equation is 

assembled through the above mentioned formulae. The pressure equation is then solved by any 
convenient linear solver. For simplicity, we have used a point Gauss-Seidel scheme, which is 
repeated a few (nswpp) times. This pressure field is then used to solve the velocity equations. The 
previously assembled Ap and A nb are used, and a few (nswpm) sweeps of the Gauss-Seidel 

scheme are made. The new velocity field is then used for calculating the next iterate of the pressure 
field. 


A point to mention is the under-relaxation used to hold the iterative process from becoming 
unstable. This is done by adding only a part of the change to the flow variables in an implicit 
manner by modifying the central coefficients and the source terms in the discrete equations. Figure 
2 shows the behavior of the single grid scheme for flow in a driven square cavity, discretized on a 
triangular grid with increasing number of elements. As is evident, the convergence deteriorates 
with increasing number of nodes, which significantly increases the cost of performing systematic 
mesh refinement studies. 

4. DETAILS OF THE PRESENT MULTIGRID IMPLEMENTATION 

Mesh generation and refinement 

In the present procedure, the coarsest mesh is first generated as for any single grid 
procedure, by the Delaunay triangulation method. Subsequent finer grids are then generated by 
successively dividing each element into four elements (Figure 3(a)). A prespecified number of 
nested grids are thereby obtained. Each coarse grid element shares three nodes with the daughter 
finer grid elements. This grid arrangement makes the intergrid transfers as well as the construction 
of coarse grid equations simpler than with the practice of using different meshes for each grid 
density [4,5,7]. However, it has the disadvantage that the coarsest grid may not be very smooth. 
Nevertheless, the boundary shape is still accurately captured because during refinements, the 
daughter nodes are moved to coincide with the boundary shape. 

The coarse grid discrete equations 
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Successful multigrid procedures rely heavily on consistent practices for the construction of 
the coarse grid equations and for the restriction and prolongation operators. Consistent restriction 
of variables and residuals to the coarser grids is the most important aspect of multigrid procedures 
for a system of equations, especially the fluid flow equations. For nonlinear equations, the Full 
Approximation Scheme (FAS) is the most suitable scheme for deriving the coarse grid equations. 
This is an extension of the more straight-forward Correction Scheme (CS) that is used for linear 
equations. 

Consider the discrete fine grid equations given by 

L f q f = F f (14) 


f f 

where L is the nonlinear operator matrix made of the convection and diffusion terms, q is the 

solution vector, and F is the right-hand side vector. The superscript f is used to denote the fine 
grid. After a few iterations on the fine grid, the residual is computed as 

R f = F f - L f q f (15) 


This residual is restricted to the next coarser grid, and it is required that the corrections satisfy the 
equation 





R 1 


(16) 


f- 1 f- 1 

where L is the nonlinear operator on the coarse grid, A q is the vector of corrections on the 

f-1 

coarse grid, and If is the restriction operator. For the FAS scheme, equation (16) is rewritten as 


T f-1 , A f-1 -f-1 f, T f-l„f ^ r f-1 f r f-1 Js 

L (Aq +If q ) = If R+L (If q) 


F f_1 + i/' 1 R f - (F f_1 


T f-1 n f-l f u 
L (If q ) ) 


or 


T f-1 f-1 pf-1 x n f-l D f D f-1 , 
L q = F +(If R-Rq ) 


(17) 

(18) 


f-1 

where Rq is the residual on the coarse grid, calculated using the restricted solution vector and 
f-1 

q is the solution on the coarse grid. After a fixed number of iterations on the coarse grid, the 
corrections implied by the coarse grid solution can be extracted from the relation 


Aq 


f-1 



T f " 1 n f 

- If q 


(19) 


The above FAS scheme is used in a straight-forward way for the momentum equations. The 
restriction and prolongation operators defined below provide a consistent and convergent multigrid 
procedure. The main complexity in the present scheme lies in the construction of the pressure 
equation which satisfies mass continuity not for the nodal velocities but for a different set of fluxes 
implicitly located at the cell faces of the control volume. As the success of the present procedure 
relies solely on this aspect, we give below details of the coarse grid pressure equation. 


413 



The FAS form of the coarse grid pressure equation that results from the continuity 
satisfaction condition is derived as follows. We begin with the correction equation 

(V • fi'/ -1 = i/" 1 R c f (20) 

where the prime denotes the correction in u, and the right-hand side is the restricted residual in the 
continuity equation. Equation (20) is expressed as 

V • (fi + fi'/' 1 = i/" 1 R c f + ( V ’ fi ) f_1 (21) 

Now, 

fi = G + D Vp and v = v + D Vp (22) 

where fi is the momentum velocity and Vp is the pressure gradient that is used to evaluate the cell 
face fluxes. For the coarse grid equations, the components of fi are defined as 

G = ( R u + ZA nb u nb ) / A p + (1 - a) u 
and 

v = (R v + £A nb v nb )/A p + (1 - a) v (23) 

where R u and R v are the net coarse grid momentum residuals defined from equation (21) as 

R = i/" 1 R f - R 0 f_1 (24) 

Substituting equations (22) in (21), the coarse grid continuity equation is given by 

V • (G + D Vp + u' + D Vp'/ 1 = 1^ 1 r/ + V • (G + D Vp/ 1 (25) 

f 1 f-1 f 

where p is the restricted pressure If p . Equation (25) can be further rewritten as 

V • (D Vp + D Vp'/" 1 = i/' 1 R c f - V • G f_1 + (V • D Vp + V • G) f_1 

T f-1 D f V7 *f-l , p f-1 n&\ 

= If R c - V • u + R c q (26) 

f-1 

where R c q is the coarse grid residual in the pressure equation calculated using the restricted 

values of the variables. It must be noted that because of the segregated method of solution, G' is set 
to zero for the pressure equation. Now, in the FAS practice, the left-hand side terms of equation 
(26) can be combined to give 

V • (D Vp/" 1 = - V • G f_1 + R c f_1 (27) 

f-1 

where p is now redefined to be 

f-1 T f -l J p f-1 T f-1 p f , p f-1 / 95n 

p = If p+(p) and R c = If R c + R c q (2o) 


414 



f-1 

Equation (27) has the standard structure of the pressure equation with an added residual R c 
Restriction and prolongation operations 

Restriction and prolongation operators for structured rectangular and curvilinear grids are 
now well established. For arbitrarily generated sequence of unstructured grids the intergrid 
transfers must be performed through systematic interpolations using appropriate geometric 
coordinates of the variable locations [2], An advantage of constructing fine grids embedded within 
the coarse grids is that the simple injection scheme can be used as the restriction operator for the 
nodal variables. Thus coarse grid values for (u, v, p) are obtained by locating the fine grid 
daughter nodes coincident with the considered coarse grid nodes. 

For the residuals in the momentum equations, several fine grid residuals are summed to 
obtain the corresponding coarse grid residual Iff" 1 Rf. We need to determine the fractions of the 
fine grid control volumes around a coarse grid node that contribute to the coarse grid control 
volume (see Figure 3(b)). The coarse grid control volume around P in two dimensions is given by 
the area ABCDEFGHIJKL. This is composed of fractions of the fine grid control volumes around 
each of the nodes P, A, B ... and L. It is apparent that the complete fine grid control volume 
around P contributes to the coarse grid volume. It can be shown that the rest of the coarse grid 
volume is made of the sum of half the fine grid volumes around each of the nodes A, B, ...and K. 
Therefore, the restricted residual at point P is the sum of the fine grid residual at point P, and half 
the fine grid residuals at the surrounding fine grid nodes. 

The prolongation process similarly is considerably simplified because of the mesh 
embedding. Coarse grid corrections to the solution are prolongated by direct injection at those fine 
grid nodes that coincide with the coarse nodes. For those fine grid nodes that lie in between the 
coarse nodes, the corrections are determined as averages of the corrections at the two surrounding 
coarse nodes. For example, in Figure 3(a), the coarse grid corrections at nodes P, A, and B are 
injected onto the next finer grid, whereas the corrections at a node such as D are determined as 
averages of the corrections at P and A. 

5. TEST CALCULATIONS 

We shall now present the performance of the algorithm in three model flow problems that 
illustrate the potential of the technique in calculating complex internal flows. The three selected 
problems reflect complex geometry, elliptic nature of the flow field and the presence of very fine 
scale variations in the flow that can only be resolved by a very fine mesh. In future, other problems 
that contain inflows and outflows, periodic boundary conditions and turbulence equations will be 
considered. The main point to be demonstrated here is that the method converges rapidly and that 
the rate of convergence is independent of the mesh density. In comparison with the single grid 
convergence shown in Figure 2, the multigrid method should save a large number of iterations. 
This is indeed the case as will be presented below. 

Laminar Flow in a Square Cavity 

We have conducted a systematic testing of the influence of the flow Reynolds number, the 
under-relaxation factors and the mesh density for three model driven cavity problems. The first one 
is the familiar problem of flow in a driven square cavity. In our tests, the square cavity is 
discretized by triangular elements. The triangulation is performed by the Delaunay procedure. 
Several levels of grid are then superimposed over the coarsest grid. Since upwinding was not used 
in the present study, for each mesh level, there was a limiting value of the Reynolds number 
beyond which convergence was not possible. Therefore, in the multigrid sequence, the desired 
Reynolds number was used only on the finest mesh. Iterations on each of the coarser meshes were 
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performed with its stable maximum value of the Reynolds number, following along the concept of 
double discretisation. Two fixed V- cycles were examined. In the first, the number of iterations on 
the coarse grids increased as the coarsest grid was approached. On the locally finest grid, one 
iteration was performed. The next grid used two relaxations and the subsequent one three and so 
on. The same number of relaxations were performed on the up-leg of the V-cycle, except at the top 
of the V-cycle. In the second fixed cycle, a fixed number of three coarse grid relaxations were 
performed accompanied by one. relaxation on the finest grid. Both schemes were well convergent 
except for minor differences in the rates of convergence and the CPU times. 

Figure 4 shows the convergence history for a Reynolds number of 50 for different mesh 
densities, with the mass residual plotted against the number of iterations on the finest grid. In all 
the runs, the coarsest grid had 40 elements and 29 nodes. The finest grid in the 5-grid run had 
10240 elements and 5249 nodes. It is apparent from the plots that the rate of convergence in all 
cases is nearly independent of the grid size. There is a five order decrease in the mass residual in 
less than 20 multigrid cycles. This may be compared with the convergence shown (for 640 
elements) if only a single grid is used. Figure 5 shows the multigrid convergence for the highest 
permitted Reynolds number of 500 which requires a slightly larger number of iterations due to the 
increased' nonlinearity. The calculated results agreed well with previously reported results of Ghia 
et al. [20] and Vanka [13]. 

Laminar flow in a triangular cavity 

The flow in a triangular cavity wherein the fluid motion is set by the motion of the top wall is 
an interesting complex flow which results in an infinite number of vortices of diminishing intensity 
towards the lower corner of the cavity [21, 22]. Although the square cavity has been studied 
extensively, there has been very little numerical work reported on the triangular cavity [23]. The 
triangular cavity cannot be easily discretized by a curvilinear mesh that is smooth and has high 
quality. However, it is ideally suited for triangulation. For the calculations presented here, the 
depth of the cavity is twice the width of the top wall. Here, as in the square cavity, the top wall is 
moved to the right with a velocity u = 1. A series of Reynolds numbers up to 800 were considered 
and the performance of the method was evaluated. Here the Reynolds number is defined with 
respect to the depth of the cavity and the top wall velocity. 

Figures 6 and 7 show the multigrid convergence of the code for Reynolds numbers of 50 
and 800. Linear convergence is observed even with 12288 elements and 6305 nodes. The velocity 
vectors and streamtraces in the flow field are shown in Figures 8 and 9 for Reynolds numbers of 
50 and 800. The occurrence of the series of vortices is replicated by the calculations to the point 
of grid resolution. Further resolution near the bottom comer should reveal more and more eddies 
of smaller dimension. Moffat [21] has shown that for Stokes flow, the distance of each eddy 
from the corner increases in geometric progression as does its intensity. This was indeed seen for 
all the eddies except for the one near the top wall. Therefore, starting from the second eddy, the 
ratios of successive distances from the corner for Re = 50 are respectively, 1.97, 1.98 and 1.9. 
The deviation from the expected series for the topmost eddy is probably because of the breakdown 
of the Stokes flow assumption there. Near the top wall, inertial effects dominate, and Moffat's 
analysis is not valid there. 

Laminar flow in a semicircular cavity 

The final problem considered is the flow in a semi-circular cavity which has a curved 
boundary. In this case, the coarsest triangulation does not capture the true shape of the boundary. 
However, as the mesh is refined, the fine grid points are moved to the boundary to fit the shape. 
Thus a better representation of the boundary is obtained. For this geometry also, several Reynolds 
numbers and mesh densities were considered. As a representative plot. Figure 10 shows the 
convergence for the Reynolds number of 500 discretized with 3584 elements and 1873 nodes. The 
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consistency of the coarse grid and fine grid transfers is demonstrated by this rate of convergence. It 
is to be noted that only the near boundary elements are altered and no remeshing is performed. This 
preserves the restriction/prolongation practices that are valid in the interior. The velocity vectors 
and streamtraces in the flow field for Re = 500 are shown in Figure 1 1 . 

Table 1 summarizes all the calculations currently performed with this procedure. The 
corresponding work units are, also presented, which accounts for the coarse grid iterations. The 
work involved in the injections and interpolations during restriction and prolongation is neglected 
as per the standard practice in multigrid literature. 

6. CONCLUSIONS 

In this paper, a multigrid method for unstructured grids based on geometric coarsening 
(versus algebraic coarsening, Webster [14]) has been presented. A sequence of embedded grids 
has been used to smooth out low frequency errors, and accelerate the convergence on fine grids. 
The momentum and continuity equations are discretized by a control volume procedure with equal 
order interpolations for the variables. The mass continuity equation is transformed to a pressure 
equation which is derived through special interpolations that provide a well-connected pressure 
field. A simple iterative scheme such as the Gauss-Seidel method has been used to relax the 
discrete equations on any grid. The coarse grid pressure equation is constructed by a consistent 
restriction of the cell face fluxes and appropriate equations. It is demonstrated that the method 
provides good multigrid convergence in the three test problems for all Reynolds numbers up to 
the value permitted by the cell Reynolds number criterion of the central differencing scheme. 
Future extensions to this procedure are underway to include periodic boundary conditions, 
turbulence models, time-dependent terms, and three-dimensional variations. 
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Figure 1: (a) Unstructured mesh with control volume around node P; (b) Element PAB and local 
coordinate system 



Number of Iterations 


Figure 2: Single grid convergence for shear driven flow in a square cavity with increase in number of 
elements 





Figure 4: Muitigrid and single grid convergence for laminar flow in a square cavity at Re = 50 
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Figure 5: Mulligrid and single grid convergence for laminar flow in a square cavity at Re = 500 



Figure 6: Multigrid and single grid convergence for laminar flow in a triangular cavity at Re = 50 
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Figure 10: Multigrid and single grid convergence for laminar flow in a semicircular cavity at Re = 500, 
with 3584 elements 
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Table 1: Number of fine grid iterations for a five order decrease in the residuals, shown as a function 
of the number of elements and the Reynolds number. Each fine grid iteration corresponds to three 
work units 



Semicircular cavity 
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Abstract 

A multigrid-mask method for solution of incompressible Navier-Stokes equations in primitive variable 
form has been developed. The main objective is to apply this method in conjunction with the pseudospec- 
tral element method solving flow past multiple objects. There are two key steps involved in calculating 
flow past multiple objects. The first step utilizes only Cartesian grid points. This homogeneous or mask 
method step permits flow into the interior rectangular elements contained in objects, but with the re- 
striction that the velocity for those Cartesian elements within and on the surface of an object should be 
small or zero. This step easily produces an approximate flow field on Cartesian grid points covering the 
entire flow field. The second or heterogeneous step corrects the approximate flow field to account for 
the actual shape of the objects by solving the flow field based on the local coordinates surrounding each 
object and adapted to it. The noise occurring in data communication between the global (low frequency) 
coordinates and the local (high frequency) coordinates is eliminated by the multigrid method when the 
Schwarz Alternating Procedure (SAP) is implemented. 

Two dimensional flow past circular and elliptic cylinders will be presented to demonstrate the versa- 
tility of the proposed method. An interesting phenomenon is found that when the second elliptic cylinder 
is placed in the wake of the first elliptic cylinder a traction force results in a negative drag coefficient. 

1 Introduction 

The motive to develop the multigrid-mask method is to remedy the drawback of grid generation which 
often results in a tremendous effort to achieve the desired layout of grid points for flow past multiple 
objects. As expected, the grid generation becomes even more difficult when the objects are close to 
each other or randomly moving. The situation occurs in many physical problems, such as cross flow in 
shell-tube heat exchangers, two phase flow in multiple particle sedimentation, and flow of blood cells in 
arteriols, capillaries, and venules (Stokes flow). 
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The conventional numerical simulation of Navier-Stokes (or Stokes) flow with multiobject systems falls 
into two main categories: (I) distinguishable and (II) indistinguishable fluid-object interfaces. Category I 
defines a distinct boundary between objects and fluid, and exact boundary conditions; velocity and force 
can be prescribed on the surface of objects. Actually, this category partitions the entire flow domain into 
two heterogeneous systems: objects (may or may not have fluid inside) and fluid system. It is capable of 
providing highly accurate details of flow interaction among objects but is computationally intensive (not 
more than three objects). Ingber [I] and Tran- Cong & Phan-Thien [2] use the boundary element method 
for suspensions of rigid particles in Stokes flow and Li, Zhou, & Pozrikidis [3] use the boundary element 
method for deformable particles. 

Category II implies that a fuzzy boundary exists between objects and fluid. In other words, there is 
no distinct boundary between objects and fluid; therefore, a homogeneous system can be applied to the 
entire domain. As a result, a single set of fluid dynamics equations holds at all grid points (a “stationary” 
grid) of the domain and no internal boundaries are necessarily defined, i.e., original boundary conditions, 
force on the fluid-object surfaces, now become the additional inhomogeneous source term in the Navier- 
Stokes equations. However, a sharp discontinuity for the velocity field (or other variables) between the 
fluid-object interfaces should be preserved in conformity with the original problem. In order to maintain 
a sharp front between fluid-object interfaces, the fuzzy boundary should be restricted to within a few 
mesh distances; the less the mesh distance, the better the resolution of fluid-object interfaces. A variety 
of means to achieve the desired sharp fluid-object interface are suggested by many investigators [4, 5, 
6]. Basically, the flow field is discretized by the finite difference approximation on a stationary grid 
to cover the entire flow domain. For the moving or deformable objects, a separate object grid which 
configures the geometry of objects needs to be defined, and this object grid is allowed to move with the 
speed interpolated from the stationary grid. The discussion of moving or deformable cases is beyond the 
current scope. 

Briscolini and Santangelo [5] proposed the spectral method to solve the incompressible unsteady flow 
over a circular cylinder by introducing a strip zone (or equivalent to stationary boundary layers in which a 
steep change of field variables occurs) of control within a few meshes. A narrow mask (Gaussian) function, 
defined as zero inside the objects and one elsewhere along with a smooth connection between these two 
values within the strip zone, is applied to the velocity field. The drawback of the mask method is that it 
only provides an approximate flow field due to an inexact capturing of the configuration of the objects by a 
stationary grid alone as well as the thickness of the fuzzy boundary (a few meshes wide) between the fluid 
and cylinder. Peskin [6] adopted the immersed boundary method for numerical simulation of blood flow 
in the human heart. His idea is very similar to the mask method of Briscolini and Santangelo [5] except a 
separate material grid is added to trace the heart wall movement. For the data communication between 
the stationary grid and the material grid, Peskin [6] employs an approximation to the delta-function to 
define the interpolated velocity and force transferred between the fluid-object system. 

The objective of this paper is to develop a numerical method which combines the desired features of 
both category I and II and that can also accurately simulate the flow interaction among multiple objects. 
In practice, it includes two major steps: (1) apply a stationary grid to obtain a fast solution covering the 
entire domain, which is similar to the category II approach but differs in some respects by requiring that 
the velocity for the stationary grid falling inside objects is imposed to be small or zero (a homogeneous 
step or mask method is hereafter named); and (2) generate a local fluid grid surrounding objects to 
exactly capture the surface configuration of objects, which is similar to category I by prescribing exact 
boundary conditions on the surface of objects (a heterogeneous step). Notice that step (1) only provides 
an approximate flow field and step (2) corrects the approximate flow field predicted from step (1) with 
the imposition of exact boundary conditions on the surface of objects. 

In domain overlapping terminology, one can regard the local fluid (or fine) grid as being fully over- 
lapped with the global stationary (or coarse) grid. A data communication process between the stationary 
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and fluid grid can be conducted by the Schwarz Alternating Procedure (SAP) [7]. Although the grid 
points of each grid system in the overlapping area are not coincided with each other, the SAP iterative 
scheme still can be used effectively for data communication between the stationary and fluid grid in con- 
junction with the multigrid method [8, 9]. The role of the multigrid method in the SAP process ensures 
a smooth data interpolation between the global stationary and local fluid grid without introducing any 
high-frequency error. 

The solution of the Navier-Stpkes equations is implemented by the pseudospectral element method, 
which is an extension of the global pseudospectral method to the element-type method by requiring that 
the function continuity c° be continuous across the interface between two adjacent elements [10] when 
calculating the derivatives of a function. 


2 Primitive Variable Formulation 


2.1 Navier-Stokes Equations 


In tensor notation, the time-dependent Navier-Stokes equations in dimensionless form can be described 
as 


duj 

dt 


+ Uj 


dui 

dxi 


dui 

dx{ 


dp 1 d 2 ui 

(la) 

dxi + Re dx 2 

= 0. 

(lb) 


Here u; is the velocity component and Re is the Reynolds number. 

The method applied to solve the Navier-Stokes equations is Chorin’s [11] splitting technique. Accord- 
ing to this technique, the equations of motion are written in the form 


duj dp 
dt + dx t 1 


( 2 ) 


where F{ = —uj dui/dxj+l/Red 2 Ui/dx 2 . 

The first step is to split the velocity into a sum of predicted and corrected values. The predicted 
velocity is determined by time integration of the momentum equations without the pressure term 


-n+1 


< + AtF?. 


(3) 


The second step is to determine the pressure and corrected velocity fields that satisfy the continuity 
equation by using the relationships 


dp 


< +1 = u? +1 - A t- 
1 ' dx; 


du? +1 

dxi 


= 0. 


(4a) 

(4b) 


Here the superscript n denotes the n-th time step. 

An equation for the pressure can be obtained by taking the divergence of Eq. (4a). In view of Eq. 
(4b), we obtain 

d 2 p _ _}_dv± 
dx 2 At dxi ' 

Note that the pressure solution on the global stationary grid is solved numerically by separation of 
variables [7], while the Generalized Conjugate Residual (GCR) method [12] is used to iteratively solve 
the pressure equation on the local fluid grid. 
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3 Domain Decomposition with Multigrid-Mask Method 

As mentioned in the section of Introduction, two major steps are involved for the calculation of flow past 
multiple objects: a homogeneous step as well as a heterogeneous step. Data communication between the 
stationary and fluids grid by the multigrid method will be described in the process of the heterogeneous 
step. Each step is addressed as follows. 


3.1 Mask Method - Homogeneous Step 


A single coordinate system is used to produce a stationary grid to cover the entire fluid-object domain. 
Usually, several stationary grid points are contained inside the objects. This homogeneous step is some- 
times called the mask method, which is analogous to that proposed by Briscolini & Santangelo [5] and 
Peskin [6]. In other words, it permits flow into the interior stationary grid points contained in the objects 
and considers the objects as a homogeneous (whole) system; no distinction between the fluid and objects 
is made. But the requirement that the velocity on the stationary grid points confined in the objects being 
small or zero should be met. 

According to this step, the Cartesian grid points can be extended to cover the interior of each object 
and the entire domain. Such an approach enables us to take advantage of the fast solution for the operator 
resulting from the desired feature of a complete Laplacian type. 

As pointed out in the Introduction, the mask method only provides an approximate flow field because 
the Cartesian grids contained in the objects cannot accurately represent the configuration of objects 
themselves. Besides, the flow field on the Cartesian grid points inside or on the surface of the objects 
should be prescribed in order to comply with the original problem, i.e., no flow or small velocity inside 
the objects (including on the surface). 

Such a criterion, equivalent to finding a predicted velocity u n+1 inside the objects as appeared in Eq. 
(4a), can be met by setting 


u? +1 = u? + Ai 


dp 

dxi 


( 6 ) 


on the Cartesian grid points confined in the interior of objects. Here superscript p refers to the prescribed 
velocity. Presumably, this should implicitly force u n+1 to be equal to the prescribed value. However, due 
to the nonsmooth flow field exhibited around the fluid-object interfaces, simply choosing the predicted 
velocity u n+1 to be zero or constant does not guarantee that the velocity u n+1 obtained from Eq. (4a) 
be u ? inside the objects after solving Eq. (5). Thus, the predicted velocity u n+1 inside the objects can 
be obtained by the repeated solution of Eqs. (5) and (6). Usually, only 1 to 2 iterations are required to 
ensure that || u n+1 — u p ||< 10 -4 after a few hundred time steps. 


3.2 Multigrid Method - Heterogeneous Step 

In order to correct the approximate flow field predicted from the homogeneous step (based on the station- 
ary grid), the heterogeneous step next accounts for the actual shape of the objects by adding their own 
local coordinates; an external fluid grid surrounds each object. Since the mask method does not define 
a distinct interface between fluid and objects, rather the fuzzy interface falls within a few meshes. As a 
result, such fluid-object interfaces need to be defined, and this is what the heterogeneous step tends to 
accomplish. The boundary conditions on the surface of objects are straightforward with no slip velocity. 

In view of the domain decomposition approach for flow past multiple objects, one can regard the 
local subdomains (fine grid referred to the fluid grid surrounding each object) fully overlapped with the 
global (coarse grid referred to the stationary grid) rectangular domain as depicted in Fig. 1. As for the 
data communication between the fluid and stationary grid, the iterative SAP technique will be naturally 
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suitable for this purpose, i.e., the global stationary grid provides the outer boundary information for the 
local fluid grid and in turn the local fluid grid corrects the flow field outside objects by imposing exact 
boundary conditions on the fluid-object interfaces. 

Due to the different orientation and resolution of each grid system, simply exchanging the data through 
interpolation in the overlapping area, stationary (coarse)-fluid (fine) grid system, causes the high frequency 
error induced by the fine-grid (fluid grid) subdomain and hence affects the results throughout the whole 
computational domain. The technique of filtering the high-frequency noise is also known as the multi- 
grid method. The coarse-grid correction process often used in the multigrid method is adopted in the 
overlapping area for the coupled pressure and velocity field and has been proposed by Ku & Ramaswamy 
[9]: 

Ve • U C - V c • (i/uy) = 7/(77 - V/ • Uy). (7) 

Here V c - represents the operator of divergence on the coarse-grid subdomain, 7/ is an interpolation 
operator from the fine-grid subdomain “/” to the coarse-grid domain “c,” u is the velocity, and 77 is the 
divergence of the velocity field which should be set to zero at the first SAP iteration. The left hand side 
of Eq. (7) is the difference between the coarse-grid operator acting on the coarse-grid domain and the 
coarse-grid operator acting on the interpolated fine-grid subdomain (which is held fixed). When the term 
V c • u c appearing in Eq. (7) is substituted by Eq. (4a) the pressure equation in the coarse-grid domain 
is thus governed, and so is the pressure equation in the fine-grid domain. Actually, Eq. (7) implicitly 
functions as a coupled equation between the pressure and velocity; not only the residual of the right hand 
side of Eq. (7) should be equal to zero but also the unchanged velocity field during the SAP iteration is 
required. 

In the overlapping area 77 cannot be predetermined and needs to be adjusted until the velocity field 
generated from the coupled pressure equations V c • u c = V c • (7/uy) and Vy • uy = 77 is unchanged. 

Once the residual 77 - V • uy and velocity field do not change in the fine-grid subdomain, this implies 
that 

u c = 7/uy. (8) 

Whenever either the residual 77 - V-uy or the velocity field in the fine-grid subdomain still varies, Eq. (7) 
acts as a coarse-grid correction process to transfer the correction of the velocity field back to the fine-grid 
subdomain, i.e., 

u f w = U f + 7y(u c - 7/ uf ). (9) 

This is vital for the success of the scheme. Changes in the velocity field are transferred back to the 
fine-grid subdomain rather than the velocity field itself. At each SAP iteration, 77 can be simply chosen 
as 77 = Vy • u] ew from Eq. (7). 

The multigrid-mask SAP iterative solution of the incompressible Navier-Stokes equations in primitive 
variable form for flow past multiple objects (also shown in Fig. 1) is summarized by the following algorithm: 

1. First assume u n+1 on the outer boundary of each object. Usually u n will be a good initial guess. 

2. Solve the fine-grid or fluid grid system, where the pressure solution is obtained by the preconditioned 
General Conjugate Residual (GCR) method. 

3. With the interpolated solution of u” +1 from step (2) through Eq. (8) in the overlapping area, solve 
the pressure on the coarse-grid domain (stationary grid) by the mask method with the eigenfunction 
expansion technique and also update Uy +1 in the overlapping area of the fine-grid domain by the 
coarse-grid correction process in Eq. (9). 

4. Repeat steps (2) & (3) until the velocity u n+1 in the overlapping area satisfies the convergence 
criterion of Eq. (8). 
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It is worthwhile to emphasize that even with strong discontinuity exhibited for the velocity on the 
grid points immediately outside the objects the multigrid-mask method indeed meets the requirements 
of both having small velocity inside the objects and satisfying of Eq. (8). 

4 Results and Discussion 

Four SAP iterations are employed for all the test problems, and the convergence criterion of Eq. (8) is 
satisfied by the requirement j| u c — J/uy ||< 2.5 x 10 -4 . The radiation boundary condition [8] is applied 
on the truncated downstream to give the least influence upon the upstream flow development. 

4.1 Circular cylinders 

For the first benchmark test, we choose a uniform flow over a cylinder to give a comparison of results 
between the multigrid scheme and the pseudospectral element method [9], in which the computational 
domain is decomposed into two subdomains: an “0” grid domain, partially overlapped with the Cartesian 
grid domain. The diameter of a cylinder over the width of a channel is 1/20 in this numerical experiment; 
18 x 15 elements (each element contains 7x7 points in the x and y directions) are allocated in the 
stationary grid system, and 15 x 6 elements in the fluid (or “0”) grid system. The periodic character of the 
flow motion can be defined by the Strouhal number S = fD/Umax > where / is the shedding frequency, D is 
the diameter of a cylinder, and U is the maximum inlet velocity. Numerical results of drag coefficient Cd 
and lift coefficient Cl predicted by the multigrid-mask method, 1.379 < Cd < 1.394, —0.263 < Cl < 0.263 
for Re = 100 and 1.328 < Cd < 1.481,-0.733 < Cl < 0.733 for Re = 250, are in good agreement with 
those calculated by the multigrid method of [9]: 1.36 < Cd < 1.385,-0.269 < Cl < 0.269 for Re = 100 
and 1.29 < Cd < 1.432,-0.711 <Cl< 0.711 for Re = 250. The Strouhal number, S = 0.168 at Re = 
100 and S = 0.208 at Re = 250, also reproduces the same results as those found in [9]. Streamline plots 
presented in Fig. 2 describe the typical flow motion behind the cylinder at Re = 100 and 250, respectively. 

We secondly examine Poiseuille flow past multiple cylinders at Re = 20 using the multigrid-mask 
method. Figs. 3 and 4 show both the element layouts of the stationary and fluid grids and streamline 
plots for flow over four cylinders with the shortest distance 1.828 (Fig. 3a) and 0.414 (Fig. 4a) diameter 
of the cylinder. Numerical results indicate that less flow rate goes through the intercylinder area when 
the case in Fig. 4b is compared with the case in Fig. 3b. Due to the relatively large flow rate going 
through the outer cylinders as shown in Fig. 4b a strong separation behind the fourth (or last) cylinder 
is observed. 

4.2 Elliptic cylinders 

In this case, an incoming uniform flow past a slender elliptic cylinder of thickness ratio (minor to major 
axis) 1:6.66 at a 45° incidence angle is studied. Reynolds number is chosen to be Re = 200 (based on the 
chord length which is twice that of major axis), and' the aspect ratio (the channel width over the chord 
length) is 20. The number of elements allocated for the stationary grid system is 14 x 16 elements in 
the x and y directions, and 14 x 4 elements are adopted for the fluid grid system. The detailed element 
layout is sketched for the first elliptic cylinder shown in Fig. 1. 

When the incidence angle is 45° and Reynolds number is Re = 200, a well-known Karman vortex 
street develops [13]. The streamline plots shown in Fig. 5 illustrate the history of separation behind the 
elliptic cylinder within a cycle. If one regards the separation starting from the leading edge as seen in Fig. 
5a, the time evolution of separation is described as follows: the separation region continues to increase 
toward the trailing edge (Fig. 5b) and up to the trailing edge where the maximal lift holds. After the 
separation breaks down (Fig. 5c), it restarts from the trailing edge (Fig. 5d) and then gradually extends 
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to the region toward the top tip (Fig. 5e), where the minimal lift occurs. The separation also splits into 
two parts: one is located immediately behind the ellipse, and another forms as a vortex behind the body 
(Fig. 5f). 

The drag and lift coefficients are found to be —0.985 < Cl < —1.500,1.355 < Cd < 1.781 (as seen 
in Fig. 6), which are qualitatively similar to the case with thickness ratio of 1:10 in [13]. The Strouhal 
number is 0.275 in contrast with 0.25 in the case of a thickness ratio of 1:10. 

To demonstrate the capability ,of the multigrid-mask method in simulating the interaction among 
multiple objects, we add another elliptic cylinder with thickness ratio 1:4 (chord length is 60% of the first 
one) in the direction of incoming flow. The element layout is also sketched in Fig. 1 and the position is 
placed in the wake of the first elliptic cylinder. It is very common for us to experience the traction force 
when we park a car and another high speed car passes by to us, or when a small plane flys into the wake 
of a big plane, a tremendous suction force can cause a small plane to crash into the big one. 

In order to prove that the traction force acting on the second elliptic cylinder is induced by the 
wake effect from the front one, it is rational to plot the time history of the drag coefficient at the rear 
one. If any negative value of drag coefficient exists, it supports our assumption. In Fig. 7, the drag 
and lift coefficients of both elliptic cylinders appear in the same plot. Evidently, the negative drag 
coefficient for the second one indeed stands and strengthens the fact that the traction force acts on the 
rear elliptic cylinder. Meanwhile, the drag and lift coefficients for the front elliptic cylinder also change 
(1.30 < Cd < 1.828,-0.82 < Cl < —1.39) due to the existence of the rear one. More strikingly, the 
Stouhal number is reduced to 0.208, which is the same as that of the rear elliptic cylinder (resonant 
effect), whose drag and lift coefficients are —0.139 < Cd < 0.360,-0.939 < Cl < 0.911, respectively. 

The streamline plots as seen in Fig. 8 give a detailed description of the aforementioned traction effect. 
The phenomenon of the front elliptic cylinder is very similar to that of the single case; separation starts 
from the leading edge and grows up to the trailing edge where the separation breaks down, then restarts 
from the trailing edge and extends toward the leading edge where it splits into two parts, one on the 
surface with a small intensity and another in the wake region. The traction force can be judged based 
on the vortex formation on the surface of the second elliptic cylinder. Whenever the vortex formation 
appears on the front surface of the second one, the drag coefficient turns into a negative value as indicated 
in Fig. 8c. The negative value persists during the time period (Fig. 8c - Fig. 8e) when the separation 
on the surface of the front elliptic cylinder breaks down at the tail and restarts from the bottom and 
extends toward the tip. The intensity of the traction force turns out to be the strongest when the wake 
zone resulting from the first elliptic cylinder acts on the front surface of the second one and becomes the 
largest (Fig. 8d). 

5 Conclusions 

The solution of the Navier-Stokes equations in primitive variable form has been obtained by the pseu- 
dospectral element method via the multigrid-mask SAP domain decomposition technique. The solution 
procedure for flow past multiple (or single) objects includes two basic steps: a homogeneous step (mask 
method) and a heterogeneous step of (multigrid method). The solution on the stationary grid is first 
solved by the mask method, then the iterative solution between the heterogeneous step, the solution on 
the fluid grid, and the homogeneous (mask) step is repeated by the SAP technique with multigrid method. 

The homogeneous step permits flow into the stationary grid contained in each object but subject to 
the restriction that flow inside or on the surface of objects should be small within the prescribed error 
index. The merit of the mask method is its simplicity to first provide an approximate solution of flow 
field by the fast eigenfunction solver. The implementation of heterogeneous step is next used to correct 
the flow field predicted from the homogeneous step by considering the actual contour and exact boundary 


431 



conditions on the surface of objects. 

From the solution point of view, the problem can be interpreted as the local fluid grid representing 
the objects fully overlapped with the global stationary grid standing for the entire computational domain. 
The SAP iterative technique bridges the data com m unication between the local and global coordinate 
systems. During the data exchange between the fluid grid (fine-grid) domain and the stationary grid 
(coarse-grid) domain, the coarse-grid correction technique is used to eliminate the high frequency error 
caused by the data interpolation from the fine-grid domain to the coarse-grid domain. 

Test problems demonstrate the versatility of the proposed multigrid- mask method. Future research 
will concentrate on solution of flow in the three-dimensional geometries. 
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Fig. 2 Streamline plots for flow past a cylinder for (a) Re = 100, 
and (b) Re = 250 
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Fig. 4 Flow past four cylinders at Re = 20 with (a) element layout, 
and (b) streamline plot 
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Fig. 8 Time history of streamline plots for Re = 200 at time (a) t = 0, (b) t = 0.27T, (c) t = 0.46T, 
(d) t = 0.67T, (e) t = 0.77T, (f) t = 0.91T 
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SUMMARY 


The goal of this paper is the implementation of hybrid V-cycle hierarchical multilevel methods for 
the indefinite discrete systems which arise when a mixed finite element approximation is used to 
solve elliptic boundary value problems. By introducing a penalty parameter, the perturbed 
indefinite system can be reduced to a symmetric positive definite system containing the small 
penalty parameter for the velocity unknown alone. We stabilize the hierarchical spatial 
decomposition approach proposed by Cai, Goldstein, and Pasciak for the reduced system. We 
demonstrate that the relative condition number of the preconditioner is bounded uniformly with 
respect to the penalty parameter, the number of levels and possible jumps of the coefficients as long 
as they occur only across the edges of the coarsest elements. 

INTRODUCTION 


We shall be concerned with solving the discrete equations which arise when the mixed 
approximate is used for second order elliptic boundary value problems. Specifically, we consider 
the mixed approximation based on the Raviart-Thomas spaces [12]. Such approximations lead to 
the solution of linear systems involving block matrices of the form 

( M N t \ 

V N o )■ 

Here M is symmetric and positive definite and N T is the transpose of the matrix N. This matrix is 
clearly symmetric and indefinite. 

Instead of solving this system directly, we consider solving the penalty approximation to it (cf. 
[1],[5]). This approximation involves the use of a small parameter e (10 -3 ~ 10 -8 in practice) and 
results in a linear system involving the block form 

( M N T \ 

\ N -el ' 
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The linear system of this form can be reduced to the solution of the matrix 

M + e~ x NN T . (1) 

Although the matrix in (1) is symmetric and positive definite, it can have a large condition number 
of the order 0{e~ x h~ 2 ). Here, h is the discretization parameter. 

The hierarchical space decomposition method proposed in [8] reduces the above condition 
number to the order 0{h~ x log j-J. That is, the dependence of the penalty parameter e has been 
removed and a reduction in the mesh dependence has been achieved. In the same paper [8], a 
negative result for the standard application of the multigrid method to the reduced system has 
been suggested. The asymptotic behavior for the standard multigrid method remains of the order 
0(e- l h~ 2 ). 

In this paper, we stabilize the hierarchical spatial decomposition approach from [8] by allowing 
hybrid V-cycle type multilevel iterations developed by Axelsson and Vassilevski (cf. [2], [3], [13], 
[14]). This means that we use a pure F-cycle iteration at most of the levels while we perform a 
//-fold (// > 1) cycle iteration at levels whose index is proportional to a fixed integer parameter k Q . 
We demonstrate that the hybrid V-cycle hierarchical multilevel preconditioners constructed in this 
manner give relative condition numbers that are uniformly bounded with respect to both the 
penalty parameter £ and the number of discretization levels if k 0 is sufficiently large and v (the 
number of recursive calls at every ko level) satisfies certain inequalities determined only by k 0 . 

Finally, we note that there are other approaches suggested in Bramble, Pasciak, and Xu [6], 
Ewing, Lazarov, and Vassilevski [9], Mathew [11] for indefinite systems that arise in mixed finite 
element discretizations of second-order elliptic problems. Some of these methods are based on 
reducing the indefinite systems by working in divergence-free finite element spaces to obtain a 
system with a symmetric and positive definite matrix. 


STATEMENT OF THE PROBLEM 


Let fibea two-dimensional polygon and consider the following boundary value problem: 

f —V • (&Vp) — /, in fl, 

\ p = 0, onr = dfi, 


( 2 ) 


where / 6 L 2 (0) and k = k(x) (x £ 0 is bounded from above and below by some positive 
constants). 

We shall use the following space to describe the corresponding variational problems. We consider 
the Hilbert space 

tf(div; 0) = {v G [L 2 (fi)] 2 | V-vGL 2 (Q)} 

with norm defined by 

ll V llH(div ; n) = II V IIl 2 (Q) + ll^ 7 ' V 1 1 Z, 2 (Q) ■ 
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In (2), we set u = —kVp. Then we obtain the following variational equations: 

f (& _1 u,v) - (p, V • v) = 0, for all v € i?(div; 0), 

| (V-u, 5) = (/,?), for all <7 G L 2 (fl). 

Here (•, •) denotes the inner product in L 2 (Q) or [L 2 (fl)] 2 . 

We assume that we are given two finite dimensional subspaces 

V h C tf(div;fi) and ^Ci 2 (fi) 

defined on a quasi-uniform mesh with elements of size 0(h). The mixed finite element 
approximation of (u,p) is then defined to be the pair, (u /l ,p /l ) € V h x W h , satisfying 

f (k _1 u h , v) — (p h , V • v) = 0, for all v G V h , 

\ -(V-u a , 9 ) = -(/,?), for all ^ G kWh 


Problem (3) can be reformulated in terms of operators. We define operators M : V h — > V h , 


N : V h -»■ W h and N* : W h -> V h by 



(Mv,ij>) = 

i 

<! 

for all G V h , 
for all q G W h , 

(Nv,q) = 

-(V-v, ? ), 

(N*q,v) = 

— (?>V-v), 

for all v € V h . 


With this notation, (3) takes the following form: 


M N* \ ( n h 

N 0 J [ p h 



( 4 ) 


where f h denotes the L 2 (Q) orthogonal projection of / onto W h . 

The solution (u h ,p h ) can be approximated by regularization (i.e., by solving a reduced system 
using a penalty approximation). Let e > 0 be a small (penalty) parameter. We consider the 
solution of the following perturbed system: 


( M N* \ ( uM / 0 \ 

[ N -el){p h E )~{-f h J- 

Eliminating p% in (5) gives rise to the following reduced problem for : 

A £ u a = (m + ^N*N^j Ug = 


( 5 ) 

(6) 


The operator A e is obviously symmetric and positive definite. We note that once has been 
determined from (6), p % can be computed by 

rf = (jVuJ + /‘) . 
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The penalty method was analyzed in [1] and [5] for a class of mixed approximations. It follows 
from these results that, for the Raviart-Thomas spaces [12], 


U ~ | |//(div5tl) + I IP - Pe I U 2 (n) 


< c 


Jl u - v|U(div ; n) + jn£ h \\p - giU=(n) + £\\p\\lhq) 


7 


where the constant C is independent of both e and h. 

Moreover, we note that the problem (4) is indefinite and of saddle-point type. An adequate 
approximation can be provided if the finite element spaces V h and W h satisfy the Babuska-Brezzi 
stability condition (cf. Babuska [4] and Brezzi [7]). This means that for some positive constant (3 
independent of the mesh parameter h the following stability condition holds: 


(V • v, g) 

SUp — y 

veV k ||V||ff(div;f2) 


>m\i 


for all qeW h . 


In the remainder of this section, we describe the Raviart-Thomas spaces on the triangle T . The 
Raviart-Thomas space of order r (a given nonnegative integer) on the triangle T for the velocity is 
defined by 

V h (T) = {ve [P r (T)] 2 © v 0 } , 

where 

( xi P r {x) \ 

° \ x 2 P r (x) ) 

and P T (x) is a homogeneous polynomial of degree r. The corresponding space for the pressure is 
given by 

W h {T) = P r (T), 

where P r (T ) is a polynomial of degree r defined on the triangle T. We also consider the projection 
operator tc^ that is defined by the following: 


(^v • n, q)E = (v • n, q)E, for q G P r (E) and all three edges E of T, 
(»TfcV,^)r = (v,^) T , for G (Pr-i(T)) 2 . 


HIERARCHICAL SPACE DECOMPOSITION METHOD 

In this section, we shall describe the hierarchical spatial decomposition method [8]. We start 
with a coarse initial triangulation To of the domain 0. For any element T G T 0 , we consider the 
local ellipticity constants 

sup k(x ) 

_ _ zg T 

T inf k(x) 

x£T v j 
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and 


a = max <7r. 
Ter 0 


Note that a can remain close to 1 even when the coefficient k(x) has large jumps, as long as these 
occur only across edges of elements from T 0 . 

We next construct a nested family of triangulations 

% ,Tj = %, 

of the domain ft by subdividing each element of Tj into four congruent ones to obtain 7} +1 . We 
consider the Raviart-Thomas velocity space Vj for every triangulation Tj (with mesh size 
hj = 2~ J h 0 ). For each level j = 1, 2, • • • , J, we let 

TTjV = 7 T Aj .V, 

where 7 r^. is the projection operator defined in (7). For convenience, we shall let 7r_i = 0. 

We define the spaces Mj to be the images of the operators (7 vj — 7Tj_i) acting on elements from 14 

Mj = {w = ( 7 vj - 7Tj_i)v, all v £ 14} . 

It is clear that {Mj} are subspaces of Vj = 14 satisfying 

Vj = j = 0,l,...J. 

For j = 0, 1, ... J, we define the operator Aj to be 

( Ajv,w ) = A s (v,w), for all v,w G Vj. 

We next define the operators Dj to be A e M . That is, 

(Dji>,6) = A s (il>,0), for all if), 6 e Mj, 

where if) = (Tj — 7r j-i)v and $ = ( 7 Tj — 7Tj_i)w for some v, w e Vj. 

The primitive form of the hierarchical preconditioner proposed in [8] can be written as 

J 

(Bjv, w ) = (B 0 tt 0 v, 7r 0 w) + (D <r if) a ,9 a ) , 

cr— 1 

where Bo = Ao, if) a = (t 0 — and 6 a = (-K a — 7r a _ 1 )u;. To obtain an efficient preconditioner 

Dj for Dj (cf. [8]), we use the decomposition 

if) = lf) H +tf) P , 

where if) H £ Mj is defined element-wise for any T £ Tj- 1 such that 
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j A £ (i]> H ,0) = 0, for all 6 £ Mj , and 6 ■ n — 0 on dT; 

1 ibzr ■ n = il) • n 

\ T n dT dT 

Let D ^ be an appropriately scaled diagonal part of Df such as 

(Afv>v) =C E h )-i k E E (v-«U) 2 (x s ) 

all edges B {l 3 }’'_ n a *«* 

of all rer^j of no %° on E 

for some constant C > 0 independent of hj- 1 and for some weights 

ks = — j- H r, where T\ Pi T 2 = E and Ti,T 2 € 

max A; max k 
Tj t 2 

Then we can write Dj as 

(Djihx) = {d^hiXh) + (Dfij>p,Xp) ■ 

The final form for the hierarchical preconditioner becomes 

J 

(Bjv,w) = (B 0 kov, tt 0 w) + {poi>c,0<?) ■ (8) 

<7 = 1 

We now state the following theorem for the hierarchical preconditioner [8] without proof. 

Theorem 1. For any vector function v £ Vj, we have that 

C2~ J Bj(v,v) < A £ (v,v) < CJB?(v,v), 

where C is a constant independent of e, J , and the mesh size. 

The above theorem shows that the hierarchical preconditioner can be used to effectively 
precondition the original form A £ as long as J is not too large. 

HYBRID V-CYCLE MULTILEVEL PRECONDITIONERS 

We shall describe the hybrid V- cycle multilevel preconditioners in this section. The construction of 
these multilevel preconditioners is based on the hierarchical preconditioner (8) and some 
polynomial acceleration techniques proposedTn [2], [3], [13], and [14]. The purpose of the 
polynomial acceleration is to stabilize the growth of the condition number for the hierarchical 
preconditioner. The hybrid V-cycle multilevel preconditioner Bj is defined by recursion as follows. 

« Let p u {t) be a given polynomial of degree v > 1 such that 

(i) p„{ 0) = 1, 

< 

(ii) 0 < p u (t) <1 for t £ (0, 1]. 
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s For a given integer parameter &o > 1, we set 

1. Bo — Ao , 

2. ( B ko v,w ) = (BoTr 0 v,ir 0 w) + 53 

< 7=1 

• For s = 1, 2, . . . , m = sfco, and for all j such that m < j < m + ko, we first define operator 
i? m to be 

m 

(■ B m V , w) = (B 0 ir 0 V, 7T 0 w) + (ArVv, Oc) (Vu, to G V m )- 

< 7=1 

Then the operator Bj is obtained for all v, u; € Vj by the relation 

j 

(■ Bjv , u>) = (5 m 7r m n, 7r m n;) + 52 (Ar(ff«rV - n<r-iv), (ir„w - . 

cr=m4-l 

We next present some technical lemmas which are used to prove the convergence results for the 
hybrid V-cycle multilevel preconditioners. We will state these lemmas without proof. We refer to 
[10] for detailed proof. 

Lemma 1. For any function v €E Fj+fc 0J we have that 

A £ (7TjV, 7TjV) < T)(k 0 )A £ (v,v), 

where rj(k 0 ) = C2 k ° and the constant C is independent of j and the penalty parameter e. 

Lemma 2. Let m and j (> m) be given integers. The following inequality holds for some constant 
Sm ~ 0: 

(A m v,v) < (B m v,v) < (1 + S m )(A m v,v), for all v G V m . 

Lemma 3. The following spectral equivalence relation holds for all v £ Vj: 

JZ~yfl( A i v ^ v ) ^ {Bjv,v) < (8 m ri(j - m) + r)(j - m) 

+ br l( l ) 13 Vti ~ ^(AjVtv). 

<r==m+l ' 

We shall use the following polynomial p v (t) for the preconditioner: 

1 + 
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with 


( 9 ) 


v > yj(k> + l)l)(*o)- 

Here T v is the Chebyshev polynomial of degree v. 

Let a be a small positive parameter satisfying 


sup 


1 




G [&, 1] 


1 2 


(l-V^ + fr + yE)' 


(l + Va) 


cr—1 


< 7=1 


a 


where the parameter a - , 

&o + l 

We note that such a parameter exists under the above choice of v because in this case we have 
the following asymptotic relation: 


T 2 


(i - v&y + (i + v&y 
2Vs£(i-%/S) , ~(i + vS) 


(7 — 1 




and for a sufficiently small a we solve the inequality for v 


1 1 
< 


v 2 a arj(ko ) 


Let A j be the largest eigenvalue of A J 1 Bj. An upper bound for A m+ & 0 is given as follows. 


Lemma 4. 


'm+A:o 


~ v{ko)snp \T^ry te 


a 


ko + 1 


,1 


+ hr l{ 1 ) J2 V( a ) ^ 

(7 = 1 


a 


( 10 ) 


The multilevel preconditioner Bj will be spectrally equivalent to Aj. We summarize these results 
in the following theorem. 


Theorem 2. Let v satisfy the inequality (9) for some given integer parameter k 0 > 1. For 
a G (0, 1] that is sufficiently small and satisfies the inequality (10), the following spectral relation 
between Bj and Aj holds uniformly for j > 0: 

7— TT ( A i v ’ v ) ^ i B i v , v ) < ~{ A i v , v ) f° r al1 v € Vj. 

Kq + 1 a 


446 



COMPUTATIONAL COMPLEXITY OF THE PRECONDITIONER 


To study the computational costs, we denote the degrees of freedom at level j by rij. From the 
triangulation process, we may assume that nj+i /rij = 4. Let Ws+i be the number of arithmetic 
operations performed at level (s + 1 )k 0 . We then obtain the following recursive formula: 

W s+ i = Cn( s+1)ko + J'Ws, (11) 


where the second term on the right-hand-side stands for v recursive calls of the preconditioner B sko 
(the polynomial corrections at level sk 0 ). Thus, the computation of this action is 


B 


-i 

sko 


\I-Pu 


^0 + 1 


B sko 



( 12 ) 


We note that the first term on the right-hand side of (11) stands for the work to invert the 
block-diagonal matrices Df and Df and v actions of the matrix A sko involved in (12). Thus, in 
general, C is a function of the parameter v and ko (C = C{v , k 0 )). To obtain an optimal order 
preconditioner in terms of computational complexity, we choose v and k 0 such that 


W s+ i < const n( s+ i)fc 0 . 


Using (11) recursively, we obtain 

W s+ i = C («(,+ i)fc 0 + vn sko + . . . + v s n k ^j + r/ s+1 >Vo 

v. \ s+1 Wo 


= Cn 


(s-t-l)fco 


22fc 0 


Wo A 

) n 0 


n 0 £T=0 

Hence the condition for an optimal order preconditioner is 

7 / 

< 1. 


22fco 


22fc 0 


This is the constraint for determining the parameters v and k 0 to be an optimal hybrid V-cycle 
multilevel preconditioner. 

In order to make Bj spectrally equivalent to Aj as given in Theorem 2, we need to impose 
another constraint for choosing parameters v and ko as follows: 


v > yj{k 0 + l)r)(k 0 ). 

Therefore, we establish the following relation for the parameters v and k 0 to guarantee both the 
optimality and the spectrally equivalent property for the hybrid V- cycle multilevel preconditioner. 
The relation reads as follows: 

2 2k ° >v> C\ja(ko + l)2 fc °. (13) 
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These relations can always be satisfied because ko can be sufficiently large. We summarize the 
above results in the next theorem. 

Theorem 3. The hybrid V -cycle preconditioner Bj with the parameters specified above gives an 
optimal order CG method if v and k 0 satisfy the inequalities (13). 

IMPLEMENTATION OF THE PRECONDITIONER 


We first consider the hybrid C- cycle multilevel preconditioner in following matrix form: 

(1) For k = 1, set 


MW = AW. 


(2) For k = 2 to J, we define 
MW = 


r A {k) 
Ali 

0 

' I 

aW 1 A k ) ' 
Ai -A 2 

A fc ) 
L A21 


. 0 

I 


where 


f Af^- 1 ) = k^sko + l\ 


(14) 


k — sko T 1) s = 1, 2, • ■ • , J j k(j — 1. 

Here, p„(t) is a polynomial of degree p > 1 such that ^(O) = 1 and 0 < p v (t) < 1, t G (0, 1]. 

To solve systems defined by M = M^ J \ we use the following multilevel iteration (AMLI) from 

[3]. Let pl s \s = 1,2 , • ■ • , J, be given polynomials of degree v such that p^(0) = 1. Let polynomial 
of degree p — 1 be 

Qi s) = (1 - pP)^ 1 = ?o 5) + ql s) t + • • • + v = p( s l 

For a given vector d = d( J ), the AMLI gives 

c( J ) = 

= [/ - p^MW-'aW)] A( J ) _1 d( J ). 

In particular, for the case pj = 1 (i.e., p\?\t) = 1 — t), we have simply 

M( J ) _1 d( J ) = c( J ). 
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Algorithm AMLI. Given a set of polynomials 
such that p(*)(0) = ,1 , we set 

Qi s) = ?o° + S) t + • • • + ql-it 1 '” 1 , v = v a . 
Then, for any vector d = d^, the AMLI gives 

c = [i- Pi/ (m^~ 1 a^ ) )]a^~ 1 S j '> 

in the following steps. 

(0) initiate 

for k = 1 to J set a(k) = 0; 
k = J- 

(1) a(k ) - a{k) + 1; 

if a(k) = 1 then 

vW = 0, W = q™, dW; 
else 

W = (fc) d(A) + 

(2) vf> = A[f "Vj; 

(3) d^- 1 ) = W 2 - 4?Vi; 

(4) k := k — 1 

if k > 0 go to (1); 

(5) solve on the initial level 

(6) set 

v‘‘ +1) = v<‘>; 

(7) v<‘ +1 > = v<* +1 > - 4? +1 l' I 4t +, )v<‘ +I) ; 

(8) k := k + 1; 

if <r(k) < Uk go to (1); 

(9) &(k) = 0; 

if k < J go to (6); 

(10) c( J ) = d( J ). 
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NUMERICAL RESULTS 


We present numerical results of the hybrid U- cycle multilevel preconditioners for the following 
two-dimensional discontinuous coefficients problem on the unit box O=(0, 1) x (0, 1). In all 
experiments, the lowest order Raviart-Thomas triangular element is used. We consider the model 
problem given in (2), where the diffusion coefficients are assumed to be piecewise constants on the 
coarsest grid triangles. As a consequence, both local and global elliptic constants or and a are 1. 
In particular, we give the numerical results for the 32-subdomain case with k _1 in each subdomain 
as shown in Figure 1. 



Figure 1: the coefficient A: 1 on each subdomain 

For each preconditioning step, we note that a set of polynomials of degree v = v s 

Qi s) = q ( o S) + q[ S) t + • • • + q[ s) t\ s = 1, 2, • • • , J 

is used in the AMLI algorithm. These polynomials are specified by the following set of positive 
integers: 

which are the degree of the polynomials for each level. Here level 1 and level J(= 6) are the 
coarsest level ( h 0 = 1/4) and finest level [h = 1/128), respectively. We note that v\ is always chosen 
to be one, and that the coarsest grid problem is always solved by the CG method to the machine 
precision e mac h- 

During the preprocessing stage, for k = 2, 3, • • • , 6, we first estimate the extreme eigenvalues of 
[M^ x Ak-i] by PCG iterations (the convergence criterion is that the reduction of the energy norm 
for residuals is not less than or equal to 10~ 6 ). 
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Suppose that 


A [M^Ak-i] C [c,4 

for some constant c and d. Then the polynomial Q ^ is computed by the formula 

Q m = h iM), 


where 


Pi(t) 


p„(t) = 


1 

1 + T„[(c + d — 2t)/(d — c)] 


v = 2, 3, ■ 


1 + T v [{c + d)/(d — c)] 

We see that this step can be done by table lookup since v is a small number {v € {2,3}) in practice. 
We refer to the set of polynomials by the set of degrees 


{v x = 1, i/ 2 = v, v z = v, • ■ ■ , vj = 1} 


We perform numerical experiments for the following cases 

(a) (1,1, 1,1, 1,1), 

(b) (1,1, 2, 1,1,1), 

(c) (1,1, 3, 1,1,1), 

(d) (1,1, 2, 1,2,1), 

(e) (1,1, 3, 1,3,1), 

(f) (1,2, 1,2, 1,1), 

(g) (1,3, 1,3, 1,1), 

(h) (1,2, 2, 2, 2,1). 

All experiments were performed in one of the research computing facilities at the National 
Chung Cheng University. The LINPACK benchmark of the machine is about 22 mflops and the 
machine constant e mac h € (10 -15 , 10~ 16 ). We measure the CPU time for both the preprocessing 
stage and the PCG iteration stage. We note that there is no preprocessing time for case (a) since it 
corresponds to the pure U- cycle hierarchical method [8]. 

We perform each case 5 times on the same machine and get the average time and condition 
numbers. The results are given in Table 1. We note that the set of polynomials (h) gives the best 
result for the condition number, although it is the most expensive. In addition to case (h), both (d) 
and (e) give very good results for the condition number. Also, most cases are less expensive than 
the pure V - cycle in terms of computing cost. 

In Table 2, we present the results for the V-cycle and case (d). The results show that both cases 
have uniform condition numbers and computing times independent of the penalty parameter e. 
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M 

preprocessing 

PCG iteration 

total time 

K 

(a) 

(1,1, 1,1, 1,1) 

0 

34.33 

34.33 

82.3 

(b) 

(1,1, 2, 1,1,1) 

.64 

28.90 

29.54 

45.3 

(c) 

(1,1, 3, 1,1,1) 

.64 

29.98 

30.62 

39.5 

(d) 

(1,1, 2, 1,2,1)' 

3.66 

25.49 

29.15 

17.5 

(e) 

(1,1, 3, 1,3,1) 

4.18 

32.03 

36.21 

12.5 

(f) 

(1,2, 1,2, 1,1) 

1.89 

26.50 

28.39 

28.1 

(g) 

(1,3, 1,3, 1,1) 

2.65 

33.23 

35.88 

22.9 

(h) 

(1,2, 2, 2, 2,1) 

8.01 

32.09 

40.10 

10.5 


Table 1: computing time and condition number k 



£=.001 

T—i 

o 

o 

o 

II 

<0 

t—H 

o 

o 

o 

o 

II 

to 

(^1, ^2, ^3, ^4, ^5, V&) 

K 

CPU time 

K 

CPU time 

K 

CPU time 

(1,1, 1,1, 1,1) 

85.3 

36.37 

84.7 

34.33 

82.9 

34.46 

(1,1, 2, 1,2,1) 

18.1 

27.91 

17.5 

25.40 

17.8 

26.14 


Table 2: comparisons of (1,1,1,1,1,1) and (1,1, 2, 1,2,1) for various e 


However, the condition number for the case (d) is independent of the number of levels (there are 
currently six levels) while the condition number of the U- cycle does grow with the order 
0(h~ x log )r) (cf. [8]). Also, the computing cost for the case (d) is quite small compared to that 
required for the V-cycle. 


CONCLUSIONS 


Based on the idea of a hierarchical block preconditioner proposed by Cai, Goldstein, and Pasciak 
[8], we develop hybrid U-cycle multilevel preconditioners that give relative condition numbers that 
are uniformly bounded with respect to both the penalty parameter e and the number of 
discretization levels J if we choose proper values for ko and u. The numerical results confirm the 
uniform convergence behavior for the hybrid U-cycle multilevel preconditioners, 
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A CONFORMING MULTIGRID METHOD FOR THE PURE 
TRACTION PROBLEM OF LINEAR ELASTICITY: 
MIXED FORMULATION* 


Chang-Ock Lee^ 
Department of Mathematics 
University of Wisconsin-Madison 


SUMMARY 


A multigrid method using conforming P-1 finite element is developed for the two-dimensional 
pure traction boundary value problem of linear elasticity. The convergence is uniform even 
as the material becomes nearly incompressible. A heuristic argument for acceleration of the 
multigrid method is discussed as well. Numerical results with and without this acceleration as 
well as performance estimates on a parallel computer are included. 


1. INTRODUCTION 


Let L? be a bounded convex polygonal domain in R 2 and d£l = (JF=i I\-. The pure traction 
boundary value problem for planar linear elasticity is given in the form 

/ inO, (1) 

9i, 1 < i < n , (2) 

where u denotes the displacement, / the body force, the boundary traction, \i > 0, A > 0 the 

Lame constants, and v is the unit outer normal. In addition, the Lame constants (/i, A) belong 

to the range x [Ao, oo) , where /x 2 , Ao are fixed positive constants. The explanation 

for the notations used in (1) and (2) is given in [4, 6]. 

It is well-known that finite element method using conforming piecewise linear (P-1) finite 
elements converges for moderate fixed A, and as A — )■ oo, i.e., the elastic material becomes 
incompressible, it seems not to converge at all ([1, 10]). In order to overcome this so called 
locking phenomenon, the method of reduced integration was employed by Brenner [4], Falk 

‘This research was partially supported by the National Science Foundation under Grant No. CDA-9024618. 
* Current address: Department of Mathematics, Inha University, Inchon, Korea 


— div |2/i e (u) + A tr ^e(u)J 5 j = 
2/t e(u) + Atr ("c(«)') s) p,|ri = 
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[6] and Lee [7] in the construction of finite element methods. The finite element methods 
employed by them are robust in A, i.e., they give a uniform convergence rate as A oo. In [4], 
Brenner proved the convergence of the P-1 nonconforming finite element method for the mixed 
formulation and robustness in A using a modification of the space used by Falk in [6] . In [7] , Lee 
proved the convergence of the P-1 conforming finite element method for the mixed formulation 
and robustness in A using the same modification of the finite element space as Brenner used in 
[4]. In addition, Brenner adopted the W-cycle full multigrid method as a numerical solver for 
the resulting linear system and obtained the convergence of a multigrid method, which is robust 
in A. For mixed problems without penalty term (e.g. Stokes equation), a W-cycle multigrid 
algorithm was developed by Verfiirth [9]. 

In this paper we present a W-cycle multigrid method to solve the linear system arising from 
P-1 conforming finite element method for the mixed formulation of the pure traction boundary 
value problem developed in [7]. We show that the convergence is uniform with respect to A 
by following the argument adopted by Brenner in [4], While the conforming multigrid method 
has the same order of convergence as the nonconforming multigrid method in [4], the former 
has about one third of the unknowns for the same mesh size. Moreover in the case of parallel 
computation the intergrid transfer operator of the conforming multigrid method is easier to 
design and has smaller communication overhead than the nonconforming one. Therefore, the 
conforming multigrid method promises better performance in the cases of both sequential and 
parallel computations. In addition, we may use this conforming multigrid method as the coarser 
grid correction in the multigrid algorithm for the P-1 nonconforming discretization. It gives 
the same convergence rate and robustness as the conforming multigrid method. In practice, 
V-cycle multigrid methods employing one smoothing step are convergent. Even though the 
P-1 conforming multigrid method is robust with respect to A, the convergence is slow in the 
practical sense. Investigating the relation between eigenvalues and norms of corresponding 
normalized eigenfunctions (u,p) we have found that an unusual bimodal distribution of || u ||#i 

vs. the eigenvalues. Based on this insight, we present a heuristic argument for a faster multigrid 
algorithm employing a weighting factor and a damping factor. Experimental results indicate 
the effectiveness of these two factors. 

This paper is organized as follows. In Section 2 we explain the conforming finite element 
method we employ. Conforming W-cycle multigrid method is discussed in Section 3. In Section 
4 we give the numerical experiments for V-cycle multigrid methods on CM-5. Also we give 
the performance estimate on a parallel computer. In the last section we discuss about the 
acceleration of multigrid algorithm and give numerical results. 


2. THE FINITE ELEMENT METHOD 


Throughout this paper, the letter C denotes a positive constant independent of the Lame 
constants and the mesh parameter hk , which may vary from occurrence to occurrence even 
in the proof of the same theorem. For the notations of several standard differential operators, 
refer to [4, 6]. 
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In order for a solution of (1) and (2) to exist, / and gi must satisfy the compatibility 
condition 

J f • v dxdy + ^ J gi ■ v ds = 0 V v G RM , (3) 

where RM, the space of rigid motions, is defined by 


RM := | u : u = (a + by,c - bx), a, b, c G R j . 

When this compatibility condition holds, the pure traction boundary value problem (1) and (2) 
has a unique solution u G where 


H k , 


(f2) := G H k (Q) : J u ■ v dxdy = 0 V v G Rm| . 


(See [4] or Chapter 3 of [7] for more detail.) Here, H k (Q), k> 0, denotes the usual L 2 -based 
Sobolev spaces of vector- valued functions (See [5]). 

Henceforth, taking J = and p = ydivu, we consider the mixed weak formulation for (1) 
and (2) as follows: 

Find (u,p) G H^Q.) x L 2 (Q) such that 


/ e(u) : e(v) dxdy + / p(div v) dxdy 

Jo, ® ~ “ Jo, 

f (divu)qdxdy f pqdxdy 

Jo. 1 Jo. 

for all ( v,q ) G H 1 ^^) x L 2 (fi). 


2/x 


= 0 


/ f-vdxdy + Y] gi-v\ ri ds , (4) 

Jo ~ fr[Jr t ~ 

(5) 


Replacing p and q by y/uJp and yd q (w > 1), respectively, we obtain the following 
formulation which is equivalent to (4) and (5) : 

Find ( u,p ) G H ^(f2) x L 2 (Q) such that 


Bu ((«,?),(«,?)) = ^ J^f-vdxdy + j^J^gi-v | r , 


ds\ 


( 6 ) 


for all (v,q) G H^Q) x L 2 (Q.), where 



The quantity oj is called the weighting factor. Equation (6) has a unique solution on 
H]_(Q) X L 2 (Q). (See [4] for more detail.) 
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Let {T k } be a family of triangulations of Q, where T k+1 is obtained by connecting the 
midpoints of the edges of the triangles in T k . Let hk := max re7 -fc diamT, then hk = 2hk+i- 
Now let us define the conforming finite element space for our multigrid method CMG. 

Wk := {u : u Ir is linear for all T £T k , u is continuous on fi} , 

Wk := [u<EWk'- J Q V:-vdxdy = 0 Vu € RM 

To describe the mixed finite element method, we define 

Q k := {q : q € L 2 (fi) and q\x is a constant for all T £ T k } . 

For the definition of nonconforming finite element space, see [4, 6]. 

For each k, define the bilinear form B Wt k on X L 2 (0) by 




where Pk-i is the L 2 -orthogonal projection onto Qk- 1 - Now, we have a conforming 
discretization of (6) , which are modifications of one proposed by Falk in [6] : 

Find ( Uk,Pk ) € Wk X Qk-i such that 


B u ,k ((«*, Pk ) , (g, ?)) = J a l'~ dxdy + f^J r ~ ds 


for all (v, q ) e X Qk- 1 - 


( 7 ) 


In Chapter 3 of [7], proving the analogue of the classical lemma for the existence of an 
inverse of the divergence operator Lee showed the uniqueness of the solution of the conforming 
discretization (7) with u = 1 and derived the following discretization error estimate: 

II U ~ + hk u ~ + | |p - Pk\\L 2 (n) 

< Ch\ j|| / ||L 2 (ft) + X] II 5*'llff 1 / 2 (r,)| • 

In [4], Brenner showed the uniqueness of the solution of the nonconforming discretization and 
derived a similar discretization error estimate. 


3. THE CONFORMING MULTIGRID ALGORITHM 


In this section we present lemmas and theorems without proofs which are found in Chapter 
4 of [7]. We set u = 1 for the time being until we have a statement for uj > 1. Let B = B% and 

B k = Bi,*. 
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Define the mesh dependent inner product by 


((»>?)> (&?)) : = (&U)L2 {a) + h 2 k (p,q) L 2 {n) . 

The intergrid transfer operator I k ~ x : Wk x Qk-i — > W’fc-i x Qk- 2 is defined by 

\ / k ~~~ X ' ' k 

for all (u,p) eW k X Qk- 1, and (v,q) G Wfc-i X Qfc-2- 
Lemma 1 /| _1 : x Qk - 1 — > x < 2^-2 . □ 

Define B k :Wk* Qk- 1 -►Wax Qfc_i by 

^B*(tt,p)>(2»?)) = B k ({u,p), (jj.?)) V(u,p), (w,g) € W* x Qk- 1 • 

Lemma 2 Bk : x (Qfc-i -> x Qfc-i • D 


Let — Bk\wj- X Q k _ l - 


Lemma 3 The spectral radius of B k < Ch k 2 . □ 


The mesh-dependent norms on W k x Qk - 1 are defined as follows: 


ffl(«.P)l»,* : = \J(( B k 2 Y /2 (&l»)) fc v (H.p) G X Qfc-i . 

Define P ^ 1 : If £ x Q*_i -»• x Q *_ 2 by 

Bk- 1 (v.g)) = B k (( «,p ), («,?)) 

for all (u,p) € W - * x Qk- 1 and (u,g) €Wk-i x Qfc-2- 

The fc-.th level iteration scheme of the conforming multigrid algorithm: The fc-th 
level iteration with initial guess (yo,zo) 6 lf[ x Qk- 1 yields CMj^, (t/o, 2 o)i(w,r)) as a 

conforming approximate solution to the following problem. 
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Find (y, z) € W k x Qk-i such that 


B k{Vj z ) = where (™> r ) e Wk X Qfc-i • 

For k = 1, CMG( 1, (yo> 2o)> (u>, r)) is the solution obtained from a direct method, i.e., 

CMG( 1, (y 0 , z 0 ), (u>, 0) = ( B i) 1 (™> r ) • 


For k > 1, 

Smoothing Step: the approximation (y m ,z m ) € W k x Qk-i is constructed recursively from 
the initial guess (t/o> z o) and the equations 

(yi,2l) = (m-uzi-i) + j 2 B k((w,r) - Bkiyi-uzi-i)), 1 < l < m. 

Here, A k := Ch k 2 is greater than or equal to the spectral radius of B k , and m is an 
integer to be determined later. 

Correction Step: The coarser-grid correction in W k -i x Qk - 2 is obtained by applying the 
( k — l)-th level conforming iteration twice. In other words, it is the standard W-cycle 
multigrid method with y = 2. More precisely, 

(v 0 ,qo) = (0,0) and 

(v,, qi) = CMG(k - 1, (vi-i, q,-i), (w, f)), i = 1, 2 
wher6 (w,r) <E W£_ x X Q k -2 is defined by (w,r) := I ^ 1 ({ w,r ) - B k (y m ,z m )\. 


Then 


CMG(k, (y 0 , z 0 ), (w, r)) = (y m ,z m ) + (v 2 , q 2 ) ■ 


Let the final output of the two-grid algorithm be 

(y*,z*):=(y m ,z m ) + (v*,q*) 

where 

(v*,q*) = (Bi_ x y 1 I k k ~ lB k{y -y m ,z- z m ) . 
Lemma 4 ( v * , q*) = P £ -1 (y-y m ,z-z m ). □ 
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Let 


Rt:=! ~~k (Bk)2 ■ 

Then we have 

( V ~~ Dm > Z ~ z m) — R}c {y~yO,Z- z o) ) 

(y-y*,Z-Z*) = (i-Pt^RTiy-yoiZ-zo). 

Lemma 5 (Smoothing Step) There exists a constant C, independent of hk and m, such 
that 

m(u,p)h,k<Chf~\\(u,p)\j 0 ,k V(«,p) <E Tffc X Qk- 1 . □ 


Lemma 6 (Approximation Step) There exists a constant C, independent of hk and m, 
such that 


I - Pt'KhP) Ilk* < ChlUu,p)\h,k V(«,p) € Wi x Qk- 


□ 


Theorem 1 (Convergence of the Two-Grid Algorithm) There exists a constant C, 
independent of k and m, such that 


\l(y-y*, z - z *)\lo,k < -j=\l(y- yo,z - ^o) lllo,* 


□ 


Theorem 2 (Convergence of the k-th. Level Iteration) There exists a constant C, 
independent of k and m, such that 


II! (y, z) - CMG(k, (y 0 , z 0 ), ( w , r))| 0> jfc < 


C 


m 


-yo,z- zo)|||o,fc- n 


4. EXPERIMENTAL RESULTS 


For our numerical experiments, we choose the model problem studied in [4]: 


— div < e (u) + A tr ( e (u) ) 8 f 

~ 1 25 S3 I 

(|(h) + Atr (e(u)^ S^j ufa 


f in = unit square , 
9i, 1<*<4, 
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where T, (1 < i < 4) represents four sides of the unit square. The body force / = (/i, /t) 1 is 
defined by 

1 
A 
1 
A 


/i = -7r 2 sin Tra: sin Try + 2 tt 2 ( + 1 ) cos Tra: sin Try , 

f 2 = -<7r 2 cos Tra: cos Try + 2 tt 2 ( + 1 ) sin Tra: cos Try 


and the boundary tractions are defined by 

t 


7T 


-r cost ra:, 0 ) , 
A 


9 1 = 

~ v ^ 

93 = (-^cosTra:,0^ , 

The exact solution u = («i,« 2 )* £ H\(Q) is 


7T 

5-2 = 1 7T Sin Try, --cos Try , , 


TT 

y 4 = ( tt sin Try, — — cos Try 
A 


u\ = I — sin Tra: + — cos ttx I sin Try + 


TT* 


U 2 = [ — cos Tra; + — sin Tra; ] cos Try . 

A 


First, we describe the implementation of conforming multigrid method CMG. Let $ be 
the piecewise linear function which equals 1 at exactly one vertex pi and equals 0 at all other 
vertices of T € Tk and tpf be the piecewise constant function which equals 1 on exactly one 
triangle T,- and equals 0 on all other triangles of Tk- Then 

= {(^,0,0), (0,^,0), (O.O,^- 1 )} 


forms a conforming basis of Wk X Qk-i- The matrix representation of Bk with respect to 
the basis }i<i<n fe in the CMG algorithm is equal to M^ 1 Sk where Mk is the mass matrix 

and Sk is the stiffness matrix. Let E^~ l be the matrix representation of the intergrid transfer 
operator j£ -1 . Then we have 

Et 1 = M^EtiYMk 

where E^_ 1 is the matrix representation of the natural imbedding from W^-i x Qk-2 into 
Wk x Qk-i ■ Let Xk be the vector space which consists of the coefficients of the functions in 
Wk X Qk- 1 with respect to the basis }i<;< n)t . Similarly we define as the equivalent 
vector space to Wfc x Qk-i- With the compatibility condition (3), the CMG algorithm can be 
rewritten in matrix form for the following problem: 


Find ( Y,Z ) € Xfc such that 

(M^S t )\ x pY,Zf =(W,R)\ 


where (If. R)‘ 6 Xt . 



For k — 1, CMG( 1, (Yo, Zo), (W, R)) is the solution obtained from a direct method, i.e., 
CMG( 1, (Yo,Z 0 ), (W,R)) = (Mf 1 5 1 )|- 1 L (iy, J R) . 

For k > 1, 

Smoothing Step: the approximation (Y m , Z m ) € is constructed recursively from the initial 
guess (Yo, Zo) € Xfc and the equations 

{Yu Zi) = (y,_i, Z,_i) + ^['^((If, i2) - M^SkiYi-uZt-!)), 1 < l < m . 

Here, A* := Ch is greater than or equal to the spectral radius of (M^ 1 S k )\ x ±, and m 
is an integer to be determined later. 

Correction Step: The coarser-grid correction in X^__ 1 is obtained by applying the ( k — l)-th 
level conforming iteration twice. In other words, it is the standard W-cycle multigrid 
method. More precisely, 

(Vo,Qo) = (0,0) and 

(VuQi) = CMG(k - 1, (W, #)), » = 1,2 

where {W, R) € X^_ l is defined by 

(W, R) := E \ - 1 ({W, R) - M^S k (Y m ,Z m )^ . 


Then 

CMG{k, (y 0 , Zq), (W, R)) = (Y m , Z m ) + E$_ X {V 2 , Q 2 ) . 


With respect to the basis {^f}i<»<n fc the mass matrix M k has seven entries per row so that 

it is costly to take inverse of M k in the implementation of the algorithm at each level of the 
multigrid. In practice, we replace M k by an appropriate N k satisfying 


(i) M k and N k are spectrally equivalent, i.e., there is a constant /3, independent of h k , such 
that 

{N k U, U)i 2 

yu ~ eXk ' u ~*°~- 


(h) 

(iii) 


Nr'Sk : X k Xt 


N^iEtiYNk : X k ->■ Xi_, . 
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The conditions (ii) and (iii) are essential because the solution of our problem should lie in X k . 
In the smoothing step, instead of A*, we use h-N k ,k which is the spectral radius of (N k l Sk) l^-x 
and by Lemma 3 we have 


Spectral Radius of ( N k < Ch k 2 . 

The multigrid algorithm CMG' is convergent with respect to the norm 

1 • |o,fc = y/(-,-)k = \J(Mk-, -)i 2 ■ 

By slight modification of the proof of the convergence theorem of the CMG algorithm, we 
obtain the convergence theorem of the multigrid algorithm containing Nk instead of Mk with 
respect to (Nk-, -)]^ 2 which is equivalent to ||| ■ fo,*. See [2] for more detail. For this specific 
experiment on the unit square we take Nk = diag(M&) as suggested in [2], which allows the 
use of an under-relaxed Jacobi scheme of smoothing. Most rows of the stiffness matrix Sk have 
sixteen entries so that most rows of N k 1 Sk have again sixteen entries. Note that the matrix 
representation for I £ _1 has again seven entries per row. In the coarsest grid we use a direct 
solver for the (6 x 6) linear system which comes from the matrix representation with respect 
to the basis of X±. 

The performance of multigrid algorithms has usually been measured in Work Units. In 
serial machines, since the total CPU time is proportional to the amount of computational work 
and smoothing steps make up most of the multigrid work, a reasonable unit of effort is the 
Work Unit (WU) defined in [3] as the amount of computations in one smoothing step in the 
finest grid. 


However, in parallel machines (in particular, massively parallel machines adopting data 
parallelism) we use a somewhat different method to measure the computational work. In this 
paper, we use one WU as the amount of computations needed in one smoothing step of the 
conforming multigrid method CMG at the finest grid on a serial machine. Let nk be the 
number of unknowns at &-th level and Qcomp be the number of operations required to compute 
one smoothing step at each mesh point. Then we have 

njQcomp = 1 (WU) 


where J-th level represents the finest grid. Let p be the number of processors and assume 
two-dimensional square data distribution (cf. Chapter 5 of [7]). Then the number of unknowns 
of A;-th level allocated to each processor is 


rk = 


nk 

P 


and 



for k = 1, 




where [x] is the smallest integer greater than x. On a parallel machine we need an additional 
unit to measure the communication work. We define one CU ( Communication Unit ) as the 
amount of communications needed in one smoothing step of the conforming multigrid method 
CMG when we assume a large system of p-nj number of unknowns. Let Q CO mm be the number 
of interprocessor communication steps required to compute one smoothing step at each mesh 
point. Since about 4 y/rk mesh points in a processor do interprocessor communication in the 
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Table I: V-cycle of CMG when h = 1/64 



A = 10 

A = 100 

✓ 

^ = 1000 

smoothing 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

1 

68 

582 

1626 

244 

2073 

5788 

334 

2842 

7935 

2 

67 

, 572 

798 

223 

1894 

2645 

293 

2491 

3478 

3 

66 

559 

520 

201 

1714 

1595 

255 

2169 

2019 

4 

64 

546 

381 

184 

1564 

1092 

226 

1924 

1343 


smoothing step of the conforming multigrid method CMG, we have 

4 y/njQ comm — 1 (CU). 

Let T comp be the time needed to perform the computational work of one smoothing step at 
one mesh point and T comm be the time needed to perform the interprocessor communication 
in one smoothing step at one mesh point. The multigrid algorithms in this paper are one-sided 
method, i.e., it uses the smoothing step before correction step. If smoothing steps are used 
before and after correction step, the multigrid method is called symmetric. Note that as far 
as the convergence is concerned a symmetric V-cycle multigrid iteration is the same as two 
one-sided V-cycle iterations (See [8]). 

The programs execute the multigrid iterations until the discrete L 2 relative error is less than 
.03 for the mesh size h = ^ (10,498 unknowns) and for various number of smoothings and A. 
The experiments reported here were run in double-precision arithmetic on CM-5 Vector Units 
with 32 processors. 

In the Table I, the numbers in the columns of A = 10, 100, 1000 represent Work Units, 
Communication Units and Ni ter (the number of iterations of CMG). While we have only 
proven that CMG converges for the W-cycle with many smoothing steps, we see that in practice 
it converges even for the V-cycle with one smoothing step. In both cases, convergence is 
independent of the mesh size hk and Lame constant A. The total amount of computational 
work of a 7-level V-cycle is 


y^comp — ^ 


( 7 

E 


k = 2 


(i) 


7-k 


n 7 


P 


N, 


iter 


n 7 


The total amount of communication of the 7-level V-cycle is 


W c 


m 


(7 

£\i 


'(« 


7-k 


n 7 


\\ 


Nit er 

y/n7 
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The total elapsed time is 


T — y^compTcomp LV comm^comm ■ 

Therefore the performance of the multigrid algorithm is dependent upon the ratio between T CO mp 
and T comm . It is not easy to obtain the ratio because it heavily depends on the implementation 
of algorithms, e.g., the topology of data distribution and distance of communications. 


5. ACCELERATION OF MULTIGRID METHOD 


Even though the P-1 conforming multigrid method is robust with respect to A, the 
convergence is slow in any practical sense. In this section we present a heuristic argument 
for the acceleration of the multigrid algorithm CMG. 

Replacing p and q by \foop and \fuq (ui > 1), respectively, we use the argument in Chapter 
3 of [7] to show the uniqueness of the solution of the equations (6) and (7), and to derive the 
following discretization error estimate: 

II « - HfclU 2 (o) + hk ^1 H - ^fclflRn) + V^Wp - Pk\\L?(Q.) 

< C u h\ 1 1| / ||L2(ft) + ^2 || Si\\H l / 2 (Ti) 

l i=l ~ ~ 

Also, following the argument in Section 3, we can develop the same multigrid algorithm for the 
problem: 

Find ( y , z ) € Wj: x Qk-i such that 



z) = (w, r), where (w, r) G W£ X Q k -\ . 


For positive definite systems of which energy norms are equivalent to H 1 norm, the 
normalized eigenfunctions (with respect to L 2 norm) corresponding to the large eigenvalues 
have large H 1 norm, which means that these eigenfunctions are highly oscillatory. However 
our linear system induced from the mixed finite element discretization of the pure traction 
problem is indefinite. Moreover, the solution space is composed of two different spaces. 
One is the space of piecewise linear functions and another is the space of piecewise constant 
functions. Using MATLAB we have investigated the relation between eigenvalues and || u ||#i 

and 1 1 [p] 1 1 z ,2 of normalized eigenfunctions (u,p) (with respect to || • ||| 0 ,A;) where [p] represents 
the jump across the edges of each triangle in 7ib-i. Figure 1 shows the eigenvalues and || u [|#i 

and ||[p]|| L a of normalized eigenfunctions of N^ 1 S k where h = 1/16 (706 unknowns). The 
eigenvectors corresponding to the negative eigenvalues have large 1 1 [p] 1 1 x, 2 , which means p is 
highly oscillating, so that the error of p corresponding to the negative eigenvalues is not 
reduced by smoothing step enough to be corrected in the correction step. By introducing the 
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weighting factor, we can magnify the size of the negative eigenvalues with little effect on the 
general distribution of eigenvalues. Figure 2 shows the eigenvalues and || u \\ H i and ||[p w ]||/,2 of 

normalized eigenfunctions of N^S^k with weighting ui = 7. By employing such a weighting 
factor the magnitudes of negative eigenvalues become larger while that of positive eigenvalues 
grow little. Therefore we expect the better performance of multigrid method for the system 
with the weighting factor. 

Since we use the Gershgorin theorem to estimate the maximum eigenvalue of Njf l S Wj k, we 
always over-estimate it. Therefore for acceleration of our multigrid algorithm, it is useful to 
use damping factor 9 in the smoothing step as follows: 

q2 / \ 

{yi,zi) = (y/-i,^-i) + \ {w,r) - , 1 <l<m. 

There is one more reason why the damping factor is useful. In Figures 1 and 2, there are 
two or three peaks of || ||#i, which means that the error of u corresponding to mid-ranged 

positive eigenvalues is not reduced by smoothing step enough to be corrected in the correction 
step. By using a damping factor the error of u corresponding to several peaks can be reduced 

simultaneously in the smoothing step. Numerical results for the effect of the weighting and 
damping factors are shown in Tables II-IV. However, as 0 — t 2, the multigrid algorithm is 
suddenly divergent so that it is risky to take 6 k. 2 in order to get better convergence results. 
Tables V-VII show the convergence results with 2 smoothings with 6 = 1 for the first smoothing 
and 6 = a for the second smoothing. By the alternating smoothings, we can take 6 near 2 in 
safe. Using these weighting and damping factors, we get about 30 times faster results. 

Acknowledgements. I would like to thank Professor S. V. Parter for his valuable advice 
and encouragement. 
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Table II: V-cycle of CMG with one smoothing, A = 10 and h = 1/64 



LJ = 1 

u> = 3 

u> = 4 

lj = 10 


wu 




CU 




iter 

WU 

CU 

iter 

Will 

68 

582 

1626 

13 

111 

310 

im 

— El 


19 

160 

446 

KB 



1171 

9 

78 

217 

9 

WRZM 

217 

13 

Da 

309 

EES 

38 


907 

7 

58 

161 

7 

WM 

160 

wm 

WM 

EB 

m 

32 


758 

5 

45 

126 

i wm 

44 


■Q 

62 

IB 

1.8 

divergent 


37 

103 

4 


97 

6 


ira 

2.0 

divergent 

divergent 

3 

29 



81 



5 

39 

no 


Table III: V-cycle of CMG with one smoothing, A = 100 and h = 1/64 



U) = 1 

u> = 6 

LJ = 1 

u = 10 

e 

WU 

CU 


WU 

CU 

iter 

WU 


||^| 



iter 

1.0 

244 

2073 


16 

136 

380 

16 

139 

jm 

18 

157 

439 

1.2 

210 

1788 

4992 

■a 



IBDB 


WM 

■a 

109 

304 

1.4 



8381 

IKI 

73 

BfiRl 

8 


be 

IHI 

■rail 

223 

1.6 

c 

ivergent 

7 

41 

1551 

7 

57 

151 

7 

WM 

nm 

KOI 

divergent 

ii 

90 


6 

53 

147 



IB 

Bl 

divergent 

divergent 

divergent 



Table IV: V-cycle of CMG with one smoothing, A = 1000 and h = 1/64 


IB 

UJ — 1 

LJ = 7 

u> = 8 

LJ = 10 

e 

wu 


iter 



iter 

WU 

CU 

iter 

WU 

CU 

iter 

1.0 

334 

2841 

7935 

wm 

144 

wm 

17 

146 

408 

19 

■EH 

440 

1.2 

336 

2855 

7972 

wm 

101 

BEEl 

12 

102 

285 

13 

■mu 

305 

1.4 

c 

ivergent 

9 

78 

217 

9 

76 

213 

9 - 

80 

224 

1.6 

divergent 

8 

67 

187 

7 

62 

173 

7 

62 

173 

1.8 

divergent 

divergent 

10 

88 

247 

6 

53 


2.0 

divergent 

divergent 

divergent 

divergent 


















































































































































Table V: V-cycle of CMG with alternating smoothings, A = 10 and h = 1/64 



w = 1 

w = 3 

u> = 4 

u = 10 

a 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

KWH 

iter 

1.0 

67 

572 

798 

13 

112 

156 

13 

113 

158 

19 

160 

224 

1.2 

56 

473 

661 

11 

92 

128 

11 

92 

129 

15 

131 

183 

KOI 

47 

397 

555 

9 

76 

106 

9 

77 

107 

13 

108 

151 

1.6 

40 

340 

475 

7 

64 

89 

7 

64 

89 

11 

90 

126 

1.8 

35 

298 

416 

6 

54 

76 

6 

54 

75 

9 

76 

106 

2.0 

32 

271 

378 

6 

47 

66 

5 

46 

64 

8 

64 

90 


Table VI: V-cycle of CMG with alternating smoothings, A = 100 and h = 1/64 



u = 1 

uj = 6 

u = 7 

u = 10 

a 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

1.0 

223 

1894 

2645 

16 

136 

190 

16 

138 

193 

18 

158 

220 

HI 

191 

1620 

2263 

13 

112 

156 

13 

114 

159 

15 

129 

180 

KOI 

172 

1461 

2040 

11 

93 

130 

11 

94 

131 

12 

106 

148 

1.6 

170 

1445 

2017 

9 

79 

110 

9 

79 

110 

10 

88 

123 

1.8 

232 

1977 

2761 

8 

69 

96 

8 

67 

94 

9 

74 

103 

2.0 

c 

ivergent 

8 

64 

90 

7 

59 

83 

7 

63 

88 


Table VII: V-cycle of CMG with alternating smoothings, A = 1000 and h = 1/64 



u = 1 

w = 7 

u = 8 

u = 10 

a 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

WU 

CU 

iter 

1.0 

293 

2491 

3478 

17 

143 

199 

17 

146 

204 

19 

158 

220 

1.2 

255 

2171 

3032 

14 

DO 

164 

14 

120 

167 

15 

129 

180 

1.4 

241 

2049 

2861 

12 

98 

137 

12 

100 

139 

12 

106 

148 

1.6 

271 

2308 

3223 

10 

83 

116 

10 

83 

116 

10 

89 

124 

1.8 

c 

ivergent 

9 

74 

103 

8 

72 

100 

9 

74 

104 

2.0 

divergent 

9 

73 

102 

8 

65 

91 

8 

64 

90 
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MULTIPLE SCALE SIMULATION FOR 
TRANSITIONAL AND TURBULENT FLOW 

Chaoqun Liu* and Zhining Liu^ 

Numerical Simulation Group, Department of Mathematics 
University of Colorado at Denver 
Denver, CO 


SUMMARY 

A new concept, multiple scale simulation (MSS), is presented in this paper. The 
basic idea is that the flow is decomposed into several component groups according 
to spatial and temporal length scales. Each group has its own subdomain, govern- 
ing system, mesh size, and discretization method. The simulation is then performed 
groupwise. This approach has been successfully applied in combination with the in- 
tergrid dissipation technique for simulation of transitional and turbulent flow in 3-D 
boundary layers, and it is feasible for 3-D airfoils and other more complex configu- 
rations. MSS should prove to ameliorate the scale problems associated with conven- 
tional direct numerical simulation. 


INTRODUCTION 

The main challenge in direct numerical simulation (DNS) is the demand on com- 
puter resources. Transitional and turbulent flows contain a wide range of length 
scales, bounded above by the geometric dimension of the flow field and bounded be- 
low by the dissipative action of the molecular viscosity (Canuto et al, 1988). The 
ratio of the macroscopic (largest) length scale L to the microscopic (smallest) length 

3 

l (usually called Kolmogorov scale) is L/l = (i?e) 4 , where Re is the Reynolds num- 
ber. Thus, for a 3-D problem, the number of grid points, N , -must be on the order 

g 

of {Re) 1 if the Kolmogorov scale is to be resolved. This estimate reveals a funda- 
mental difficulty with DNS for large Reynolds number flows because this resolution 
requirement is far beyond the capability of current or foreseeable supercomputers. 
However, this estimate is made based on a single simulation on a single grid and 
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Figure 1. Idealized sketch of transition process on a flat plate. 


is, therefore, too pessimistic. Note that the length scales involved in transition and 
turbulence processes are very different: for an open flow, in general, the main stream 
and the linear growth of inflow disturbance are dominated by large scales that dom- 
inate a large part of the flow field domain: small scales generally occur only in and 
after breakdown areas. Extremely small scales are only meaningful in a narrow area 
nearby the solid wall. These observations provide a clue that the total flow may be 
effectively decomposed into several groups based on their length scales. The large 
scale flow, dominating most of the flow field, can be simulated by conventional CFD 
schemes on relatively coarse grids. For small scale flow phenomena, which plays an 
important role only in a small area of the flow field, high-order discretization and 
very fine grids have to be used. These small scale simulations may be performed on 
several grid levels in which each grid has its own subdomain and governing system. 
This idea eventually leads to a multiple scale simulation (MSS) on several levels of 
grids. Unlike large eddy simulation (Reynolds, 1990), the MSS approach does not 
require subgrid models. A basic description of MSS and its performance for CFD 
problems with' simple configuration is the subject of this paper. 

ABSTRACT FLOW DECOMPOSITION EXAMPLE 

Here we consider the flat plate boundary layer flow as an example to describe 
the basic idea behind multiple scale simulation. Figure 1 depicts the natural flow 
transition process in a 3-D boundary layer, showing clearly the variations in flow 
regime scales. 
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Using the fact that the flow scale of interest is generally large in the free stream 
and the area before breakdown (Figure 1), we can consider the use of multiple levels of 
grid to resolve the flow. Figure 2 depicts the case of three levels used in our boundary 
layer example; Vlj represents the domain that level j is used to resolve, with the whole 
computational domain given by 

o = Oj u O2 u O3. 





Figure 2. Multiple level grids. 


To decompose the total flow according to those levels, suppose the physics is 
governed by the time-dependent Navier-Stokes equations, which we write as 

dV ~ ~ 

— + LV = F in 0, 
ot 

V = U + U' at inflow. (1) 

Here, we also decompose the inflow vector into two components (usually, U is the 

— ^ 

steady part with large magnitude, and U' is the unsteady perturbation part with 
relatively small magnitude). We then decompose the total flow field into three com- 
ponents according to 

V = U + U 2 + V 3 , (2) 

— t — * — * 

where Vi, V2, and V3 represent increasingly more local and finer scales of the flow so 
that 


v 2 = 0 inO-n 2 , 

V 3 = 0 in H — n 3 . (3) 
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To define individual governing systems for each component, first consider the 
subdomain fii, on which we impose the system 


^ + L^V X = F l in 

at 

V\ = U at inflow. (4) 

Here, iT 1 is the spatial difference operator in Hi. In general, F x ^ F can be chosen 
with some freedom to represent large scale physics, so that V x represents the large 
scale flow without the inflow disturbance. Thus, (4) can generally be solved by low 
order schemes on a coarse grid. For subdomain f l 2 , we consider the governing system 

BV 

--1 + L^(I^V 1 + V 2 ) = L^I^V X -I^F X + F in ff 2 , 

V 2 = U' at inflow. (5) 

Here, represents some interpolation operator to transfer between Hi and f l 2 . Note 
that V 2 represents the perturbation in the flow field due to the inflow disturbance U' 

— > — + — 4 

and the presumably finer scale source term F — F x . V 2 has a much smaller scale than 
does V x and should be solved by a high-order scheme ( L 0,2 ) on a fairly fine middle 
scale grid. For subdomain fi 3 , which we choose to be a small part of the flow domain, 
the governing system can be written as 


dV 3 

dt 



= £ ni (/§/£v, + /£u) 
= 0 


in H 3 , 
on <9H 3 . (6) 


VTs physical scale is considered to be very small so that (6) should be resolved on an 
extremely fine grid. 


Note that (4)-(6) together with the decomposition (2) represent a consistent 
“lower triangular” formulation that is equivalent to (1) but lends itself to individ- 
ualized treatment of various physical scales in the discretization. Its triangular form 
allows for a simplified solution process: first (4) is solved to determine V x , then 
(5) is solved for V 2 , then (6) is solved for V3, with the final result then given by 

V = Vi + v 2 + v 3 . 


APPLICATION TO POISSON EQUATION 


The idea of multiple scale simulation as described allows for any desired number of 
levels, depending on available computer resources and given accuracy requirements. 
To see the basic process more clearly, we first, use a 1-D Poisson equation as an 
example: 


d 2 d> 

dx 2 


x e (0,1) 


4 >{ 0 ) = </>(!) = 0 . 


( 7 ) 
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This problem has the analytical solution 


4>{x) — 2x(l — x ). 


Using standard central differences for discretization, 

0;+ 1 — 20 ; + 0;_ 1 _ 
h 2 _ 

and three levels (Oi , fl 2 , 0. 3 ), we obtain the numerical solution at selected points: 


in fli 
in 0 2 
in U 3 


01, +1 ~ 201, + </>!,_! 
.5 2 

02 i+1 — 202, + 02, _1 

.25 2 

03, + i — 203, + 03, _! 
.125 2 


where 0i = 7^01, 4>i = 7n 2 3 0i, 


-4, 

_4 _ + l ~ ^ 01 , + 01,-1 X 

1 ,25 2 _ 

_4 _ ~ 2 ^i. + ^1.-1 , 

1 .125 2 

02 = /S02. 



~ 20 2t + 02,_ 1 

. 125 2 


Letting 0^, 0^ 2 \ and 0( 3 ) denote the final solution at grid levels 1, 2, and 3, we 
obtain the results as shown in Table 1. Obviously, the more the grid levels, the better 
are the results. 


solution 

01 

0lP 

02 


03 

0( 3 ) 

analytical 

m 

0 

0 

0 

0 

0 

0 

0 

0(0.125) 


0.125 


0.1875 

0.03125 

0.21875 

0.21875 

0(0.25) 


0.25 

0.125 

0.375 

0.0 

0.375 

0.375 

0(0.375) 


0.375 


0.4375 

0.03125 

0.46875 

0.46875 

0(0.5) 

0.5 

0.5 

0 

0.5 

0.0 

0.5 

0.5 

0(0.625) 


0.375 


0.4375 

0.03125 

0.46875 

0.46875 

0(0.75) 


0.25 

0.125 

0.375 

0.0 

0.375 

0.375 

0(0.875) 


0.125 


0.1875 

0.03125 

0.21875 

0.21875 

m 

0 

0 

0 

0 

0 

0 

0 


Table 1. Comparison of the numerical solution with three grid levels 
and the analytical solution for Poisson equation. 


This simple example illustrates the basic idea underlying MSS, and it suggests 
that it might provide a very efficient way to performing DNS for very complex con- 
figurations. 


FLAT PLATE PROTOTYPE 

In this section, we consider spatial flat plate transitional flow as an example to 
illustrate our approach. 
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Large Scale Simulation (Vi) 


The governing equation for the base flow is governed by the Navier-Stokes equa- 
tions. Suppose there is no mass transfer on the flat plate, and gravity is negligible, 
so that F = 0. The governing equations can then be written as follows: 



dt 


+ (U • V)U + VP 


V ■ Vi 


= 

= 0. 


( 8 ) 


For stead}' flat plate flow, the Blasius solution can be assumed for the large scale 
global component: 


where 


vi = Ccc(.nn)t + 


nf'(n) - 

\/2R^ J 




( 9 ) 


v is the kinetic viscosity coefficient, and / can be found in any textbook on boundary 
layer theory (e.g., Schlichting. 1968). 


Middle Scale Simulation (V 2 ) 

These scales can be determined at inflow for the so-called spatial approach. The 
governing system is 

8V 

+ Z 2 (Vi + V 2 ) = in 

V 2 = U'it) at inflow. (10) 

Considering /^ 2 V \ = («i, 7q, uq, Pi) as known, and using (aq, t/i, zi, ti) as the coordi- 
nate system on fii, and (x 2 ,y 2 , z 2 ,t 2 ) as the coordinate system on fi 2 , then we can 

— ^ 

write the scalar x-component equation for V 2 = (u 2 , v 2 , w 2 , P 2 ) as 

du 2 d(ui + u 2 )(u 1 + u 2 ) d(ui + u 2 )(v 1 + v 2 ) d(ui + u 2 )(wi + w 2 ) 

dh + dx 2 + dy 2 + dz 2 

d(Pi + P 2 ) 1 r d 2 (ui + u 2 ) d 2 (uj + u 2 ) d 2 (ui + u 2 ) , 

dx 2 Re dx\ dy\ dz 2 

djuiUj) d(uivi) djujWx) 1 d 2 u x d 2 u x d 2 u x dPi 

dxi dyi dz\ Re dx\ dy\ dz\ dx\ ' 
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Similarly, the y— and z— momentum equations and the continuity equation are: 

dv 2 d{ui + u 2 )(vi + v 2 ) d(vi + v 2 )(vi + v 2 ) d{vi + u 2 )(ug + w 2 ) 
dh + ~fc 2 + dy 2 dz 2 


d{P l + P 2 ) 
dy 2 


1 r d 2 {vi + v 2 ) d 2 {vi + V 2 ) d 2 (vi + v 2 ) 

i?e dx% dy 2 dz 2 1 


d(vx ui) 5(uiui) d{yiwi) 

I r\ I 


1 r d 2 v l d 2 v x d 2 v u dPt 


dxi ' dyi ' dzi ■ Re 1 dx\ w yi 

du> 2 d(ui + u 2 )(w 1 + w 2 ) d(vi + v 2 ){wi + w 2 ) d{w x + w 2 )(wx + w 2 ) 

dt 2 + dx 2 dy 2 dz 2 


dy\ 


dz\ dyx 


(12) 


d(Pi 


1 


dz 2 


d 2 (wi + w 2 ) d 2 (wi + w 2 ) d 2 (wi + w 2 ) . 
Re L dxl + dyl dz\ 


d(uiw 1 ) d(vi Wi) d(wiWi) 

~d^~ + ~dyT + _ &T 


Re 1 dx\ 
du 2 dv 2 dw 2 

dx 2 dy 2 dz 2 


w y 2 

1 r d 2 wi d 2 w 1 d 2 w\. 

br^ + ^r + 


dy 2 


dz\ 


+ 


1 

r du\ dvi 

- — - d + 

dy\ dzi 


dP x 

dzi 

dwi, 


(13) 

(14) 


Since linear growth and secondary instability are present, V 2 contains a wide range of 
differing length scales, some of them rather small. We thus need to use a high-order 
difference scheme on relatively fine grids. For our purposes, a fourth-order central 
difference scheme on a staggered grid of resolution h = 0(0. 1A) is used, where A is 
the so-called T-S wavelength. 

For a generic partial differential equation, 


m +L * 


s , (15) 

we use a second (or higher)-order backward Euler difference in the time direction: 

+ 0(At 2 ). (16) 


d<f> 

dt 


Zffjk ~ )k + 4> n ijk 


2 At 


Letting L</) = (L/^)"^ 1 , where Lh is the spatial discretization of L described below, 
yields a fully implicit time-stepping scheme. This has much better stability than the 
explicit scheme and is much more efficient for representing the nonlinear N-S system. 
However, it requires solving a large algebraic system at each time step for which we 
have developed a multigrid algorithm based on so-called line-distributive relaxation 
(Liu k Liu, 1993). Only one multigrid V-cycle is usually needed to solve this large 
system, making each implicit time step comparable in CPU cost to a few steps of the 
corresponding explicit scheme. 

To minimize numerical viscosity and phase error, fourth-order central differences 
(under staggered grid frame) in space is applied: 

d(f>. _ — 4>i+2 + — S4>i-i + 4>i^- 2 

dP ~ 12Ax ’ 
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d 2 4 ] 

dx 2 ‘ 
d<j>. 


"4*1+2 + 16^,-+i ~ 30<& + 1 Q4*i 


4 


i—2 


12Ax 2 

"4*1+2 + 27<^,-+i — 27 4 i + <&-i 


24Ax 


( 17 ) 


Figure 3 depicts the stencil of the discretized x — momentum equation for the interior 
grid points. (For simplicity, we drop the subscript “2” in Figures 3 and 4.) 



H%j-\- 2 fc- 

1 ^ij+2 A: 




^*i+i & 


2 

O — 

Pi— 2 jk 

^ t fo ^ijk i 

-*■ o — 

Fj-_i 

1 

J o —] 

Pijk 

1 jfc 

-f?+l jA: 


1 

‘ Hij — 1 A: 



Hij — 2 fc . 

1 H‘ij—2 k 



Figure 3. Neighbor points for the x — momentum equation in the (x, y) plane. 

Since a staggered grid is used when we discretize the x — momentum equation, we 
need to evaluate v at the points associated with u where we have no definition for v. 
This we do by high-order interpolation (Figure 4): 

Vijlc = [9 i v ijk + v ij+l k + v i - 1 jk + ^i-l j + 1 k) 

— (u,_2 j-l k + Vi-2 j+2 k + Vi + 1 j- 1 k + Vi + 1 j + 2 fc)]/32. (18) 


Vi-5 

j+2 k 

Vi- 

. i+i 

Vi 

j+1 k 

Vi+ 

i+2 A; 




Uijk 



Vi-‘ 

2 j~l k 



- 

v t 

jk 

Vi+ 

L j~l k 



Figure 4. Fourth-order approximation for Vpu- at Uijk point. 
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Small Scale Simulation (V3) and Intergrid Dissipation 


The subdomain fl 3 that supports V3 includes the transition zones and near wall 
areas that exhibit very small length scales corresponding to vortex breakdown and 
transition processes. Very fine grids must therefore be used to resolve these scales. 
Fortunately, the task that this represents is substantially reduced by the small size 
of 0 3 (and, perhaps, the fact that the boundary conditions for V3 are homogeneous 
Dirichlet). 

Let I\ denote the interpolation operator for the variables transferring from fi; to 
Qj, and define 

u 0 = if( 7 i«i + tt 2 ), v 0 = ifUjUi + v 2 ), 

w 0 = / 2 3 (/>! + i« 2 ), Po = II(I 2 1 P 1 + P 2 ). 


Furthermore, since the time scale in ^3 is also much smaller, we need to obtain the 
variables at local time t. For example, 


u Q (t) = 


U o{t\) • (tl - t) + U 0 (tl) • (t- t\) 


f 2 


tl 


where t\ and t\ are two time levels in fl 2 . Then the resulting governing system for V 3 
can be written as 


du s 

dt 3 


+ 7 j — [(2uo + ^3)^3] + — — [(uo + ^3)^3] + -5 — (^3^0) 
0x3 dy 3 dy 3 


--^-[{w 0 + w 3 )u 3 ] + -^-(w 3 u 0 ) 
oz 3 uz 3 


1 f d 2 u 3 t d 2 u 3 1 d 2 u 3 ^ ( dP 3 
T Q „, 2 "T 0_2 ) ' 


Re K dx\ 


dyl 


r duo i duouo | du 0 v 0 ( duoWo 1 / d 2 u 0 | d 2 u 0 

'■~Qf ^ RZ I a 7 . I ~aZ d 7 v ^ ° ^ r ' ° "l" 


dz\ 
d 2 uo 


dxi 


dy ; 

dv 3 

dt 3 dx 3 


dz 2 


dyl 


) + 


dx 3 

dP 0 , 


dzl ' dx 


Re 8x2 u U2 u ' < '2 

d d d 

[(«o + ^3)^3] + -q^-{u 3 v 0 ) + v^~[( 2 ^o + 


d u , , ! , d , , 1 l d 2 v 3 d 2 v 3 d 2 v 3 ^ dP 3 

d 7 } {wo + "’ 3) ” sl + dT 3 (a3V ° ) ~ We ( M M Wt 


dv 0 du Q v 0 dv 0 vo dw 0 v 0 1 d 2 v 0 d 2 v 0 d 2 v 0 . dP 0 

' ^ ' 1=1 ‘ 1=1 dT v 0-2 ' Q ..2 ' Q ..2 ) ' 1 ’ 


■dt 


dx 2 
dw 3 


dz 2 


Re y dxl 


dyl 


dz\ dy 2 


dyi 

8 8 8 
m 3 + ^ [( “° + “ 3) “’ 31 + d 7 p sWo) + + ” 3) “’ 31 


+ ]^~( V3W °^ + '§r^ 2,w ° + w3 ^ w ^ 

dw 0 duoWo dvoWo dw 0 wo 

T r\ To + 


1 ,d 2 w 3 d 2 u> 3 d 2 w 3 dP 3 

'V n„2 ' 0..2 ' 0-2 ) ** 


Re x dxl 


dyl 


dU 


dx 2 


dy 2 


dz 2 

du 3 


1 . d 2 w 0 d 2 w 0 

V o n T o o T 


Re K dx 2 


dy 2 


dz 2 

d 2 w 0 


) + 


_ dv 3 dw 3 
dx 3 dy 3 dz 3 


dz 3 
dPo 
dz ’ 
dw 0 


r dup dvp 

dx 2 dy 2 dz 2 




(19) 


( 20 ) 


(21) 

(22) 
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The basic approach we use in fl 3 here is the same as we use in Vt 2 . The grids are 
now much finer, though not yet fine enough to resolve the Kolmogorov scale. Since 
the central difference scheme is nondissipative, trouble occurs in the breakdown stage 
where the shear layer develops and the large vortices decompose into small scale ones. 
The numerical simulation will thus have a huge energy burst, the disturbance velocity 
will be amplified by several orders of magnitude somewhere inside the flow field, and 
the computation then goes unstable. These nonphysical phenomena occur because 
our scheme is nondissipative. and the grid size is not small enough to represent the 
dissipative small vortices. 

The recently developed technique of intergrid dissipation (Liu & Liu. 1994b) can 
be used to provide the dissipation contributed by small vortices without distortion 
of the physical solution. We describe this process as follows. At each time step, we 
make the replacement 

(i -«)\7 + aiLit h v 3 h - 

Here, the scripts h and 2 h indicate the respective fine and coarse grid approximations, 
I'l h and refer to respective restriction and interpolation, and a is a dynamic weight 
factor. In fl 3 , we choose 


a = 


-^’3-ky 3 A~ 3 


(V 3 ■ V 3 ), 


(23) 


where A.r 3 , A y 3 , and Az 3 are the local spacing in the .r-. y — , and ^-directions, 
and A t 3 is the local time step. 


Numerical Test 


For the actual computation, a stretched grid that becomes much denser near the 
solid wall is used. Consider the transformation 


x = 

y = 

Z — — 

J = 


y{v) = 


ymax&l] 


TJmaxG *4" Umaxi^jn 


7 ])' 


c 

| d(£, ?7,Q | 

d(x,y,zy 


Vy, 


and 


U = y v u , 
V = v, 

W = y v w , 
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where y max is the height of the computational domain in the physical coordinate y, 
r]max is the height of the computational domain in the computational coordinate y 3 
and it is a constant that can be used to adjust the concentration of grid points. We 
can then write the contravariant based governing equations on fi 3 as follows: 


dU: 


3 + ~t 2V « + U 3 )U 3 ] + !_[«L±M^] + 

flF dy y v drj L y n 1 


dt ' y n di 
l__d 

1 rd 2 U 3 


Id dP 

+ -wt[(Wo + W 3 )U 3 ] + --^[WsUo] + y°-j^ 

3, . d ,U 3 x . d 2 U 3 . 


Re * 0 1 8? 


^-9 2 U 3 u , 

7 3 ap ( t + M ”tt + ~ap 


,du„ , idu 0 Uo , a,«,v„, , i a , rr , ap„ 

— -I W7 b w- H -XJ [UqWq] + y v -^T- 

dt y„ d£ drj y v y v d( d£ 


1 


r d 2 Uo , 1 d 2 f u 0 , , d f Uo, , d 2 U 0 „ 

WWJ M,v a>, ( y, ) + ap 1} ’ 


Rei L ap 

yj^ + + U a )V 3 \ + ^(UM) + |-[(2V„ + V 3 )V 3 ) 


dt 

d 


dt 


d dP 

+^[(Wo + W 3 )V 3 ] + ^(WsVo) + 


dy 


1 r d 2 V i 

t Wv 


Re* Q ™ de y v dy 


3 , 1 d 2 V 3 , dV 3 , d 2 V 3 , 

4 + yr,y yy -^ b 


5?7 


< 9 C 2 


r dVo , atfoVb , dVoVo , aw 0 y 0 , 3P 0 

-{yv— + + — — + — + 


1 


dt 

:lP^ + 


<9£ ' dy d( ' dy 

1 <9 2 i/ 0 , dv 0 , a 2 i/ 0l1 

"b yvlyy o + y?, Q * 9 ]}, 


iy?7 <9£ 2 y„ a?7 2 yv lyy dy yv d ( 2 
cW 3 , 1 5 r/rr , rrNwl , 1 d InrTT , , d r (V 0 + V 3 )W 3 , 
+ ~df^ Uo + U *) W *\ + ~ ] 


dt y v d£ 

d r V 3 W 0 


dy 1 


+ ^ [ 


dP 3 


yn 


yv 


+ LJL [{2Wo + W 3 )W 3]+ y v dc 


1 <d 2 W 3 

; [ O >0 * 


i s 2 

( „ ) + y^yy aS .. ) + 


Re% dt y„ dy 2 y v 

r dW 0 1 dU Q Wo d Wo 

^ Vv dt dy [ y. 


1 


at 

r d 2 W 0 


dy y n 
1 _d_ 
Vv d( l 


d 2 W 3 

d( 2 


] + ~-^{WoW 0 ] + y, dP ° 


dC 


, 1 a 2 ,Wo , , a, Wo, , a 2 w„„ 

+ -—(—) + ~W ]} ’ 


fie; 1 ap y n ap , h 


au 3 av, aw 3 
a( + ay + ~aT 


,au„ a Vo aw u , 
i at + an + ac ' 


(24) 


(25) 


(26) 

(27) 


For the details about discretization of the above system, see Liu k, Liu (1994a). 

To investigate the efficiency of our MSS approach, we choose to investigate the 
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secondary instability case with Re* 0 = 900. As above, we use only three levels to 
describe the flow. A 130 x 18 x 10 grid is employed for both Oi and fi 2 , which 
includes a 7 T-S wavelength physical domain and a 1 T-S wavelength buffer (Liu & 
Liu, 1993); a 42 x 18 x 18 patch is used for O 3 . The patch covers the downstream half 
of the flat plate except for the buffer domain. The stretch parameter is a = 3.75. 

^ As mentioned above, the Blasius similarity solution is employed as the base flow 
(Vi), which is widely used as the base flow for flat plate transition. A Benney-Lin 
type disturbance (Benney & Lin, 1960), 

U ,{k) (0,y,z,t) = Real{e 2d 4 k d \y)e- iuJ2dt + e 3d+ 4%e-^ t+l > 3z + 

is imposed on the inflow to generate V 2 . Here, <f) 2d (y) and </) 3d ±(y) correspond, re- 
spectively, to 2-D and 3-D eigensolutions of the Orr-Sommerfeld equation, and the 
superscript (k) denotes different velocity components. Other parameters used in this 
work are as follows: 


Re 0 = 

900, Fr = 86 (u 2d = u> 3d 

(3 = 

0-U ymax — ymax — 50, 

&2d = 

0.2229 - 0.00451t, 

d = 

0.2169 - 0.004197, 

t-2d = 

0.03, e 3d± = 0.01, 

0 * 

II 

303.9, x ; nd = 529.4, 

II 

<1 

Tt~s / 240. 


0.0774), 


Figure 5 depicts the contour plots of the spanwise perturbation vorticity (V 2 ) in 
plane = 0.1123 at t = 3 T, AT, ■ • • , 7 T, where T is the so-called T-S period. It is 
quite clear that within this level, the flow scale is still pretty large, and only large 
scale lambda waves can be resolved. 

Figure 6 presents contour plots of spanwise vorticity produced by V 3 in the same 
plane and at the same time as Figure 5. Though this level is still not fine enough to 
catch all of the scales in the flow field, some finer scales are resolved. We find that, 
in the patch (f 2 3 ), more vortices are generated on this level and they are amplified 
when they travel downstream. This is at least qualitatively correct. 

The final results produced by V 2 + V 3 are described in Figures 7 and 8 , showing 
clearly that more physical details can be found than in Figure 5. 

CONCLUDING REMARKS 

We have demonstrated the potential of multiscale simulation for solving fluid flow 
problems to greater resolution and with better efficiency than conventional fixed-scale 
methods provide. However, several important improvements need to be achieved: 
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• The ‘one-way’ refinement approach should be improved by ‘two-way’ grid pro- 
cessing so that the finer scale resolution more effectively influences the global 
coarser scales. This would be more in the spirit of a true multilevel algorithm. 

• The treatment of the artificial local-grid boundaries should be improved by 
other than homogene, ous Dirichlet conditions to achieve better conservation. 

9 The local source terms should somehow be improved to provide more accurate 
fine-scale features. 

® The intergrid dissipation scheme plays an important role in allowing the sim- 
ulation to retain relatively coarse resolution, but the particular choice of the 
weights here is somewhat ad hoc. We may need to find a more physically based 
rationale for determining these weights. 
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Figure 6. Contour plots of the spanwise vor- 
ticity produced by V 2 in plane y$ = 0.1123 
at t = 3 T, 4T, • • • , 7 T (from top to bottom). 


Figure 7. Contour plots of the spanwise vor- 
ticity produced by V 3 in plane = 0.1123 
at t = 3 T, 4 T, • • •, 7 T (from top to bottom). 
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Figure 8. Contour plots of the span wise vor- 
ticity produced by T? 2 -f y 3 in plane = 
0.1123 at t — ST, 4T, • • • , 7T (from top to 
bottom). 
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Figure 9. Contour plots of the spanwise vor- 
ticity produced by C 2 -j- y 3 j n plane Zq = 0 at 
t — ST, 4 T, ■■■ ,7 T (from top to bottom). 
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A NOTE ON SUBSTRUCTURING PRECONDITIONING FOR 
NONCONFORMING FINITE ELEMENT APPROXIMATIONS OF 
SECOND ORDER ELLIPTIC PROBLEMS 


Serguei Maliassov* 


SUMMARY 

In this paper an algebraic substructuring preconditioner is considered for non- 
conforming finite element approximations of second order elliptic problems in 3D 
domains with a piecewise constant diffusion coefficient. Using a substructuring idea 
and a block Gauss elimination, part of the unknowns is eliminated and the Schur 
complement obtained is preconditioned by a spectrally equivalent very sparse matrix. 
In the case of quasiuniform tetrahedral mesh an appropriate algebraic multigrid solver 
can be used to solve the problem with this matrix. Explicit estimates of condition 
numbers and implementation algorithms are established for the constructed precon- 
ditioner. It is shown that the condition number of the preconditioned matrix does not 
depend on either the mesh step size or the jump of the coefficient. Finally, numerical 
experiments are presented to illustrate the theory being developed. 

1. INTRODUCTION 

Let 0 be a convex bounded domain in IR 3 with boundary 50. Consider an elliptic 
problem 

-V • (k ■ Vu) = / in 0, 

u = 0 on r 0 , (1) 

£ = 0 onU, 

where &(x) is a uniformly positive bounded function, /(x) £ T 2 (0), To U Ti = 50, 

r 0 n r x = 0, and r 0 = r 0 ^ 0. 

Note that an approach considered in this paper is valid also for the case of the 
Neumann problem, i.e. To = 0, and it is not described here only for the sake of 
simplicity. 

Let the bilinear form a(-, •) be defined by 

a(u,v) — ( k • Vu, Vi>), u,v £ Vo(0) = {w £ LT 1 (0) : v — 0 on r 0 }, 
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where (-, •) denotes the P 2 (D) inner product. Then the usual weak form of (1) for the 
solution u G Vo(D) is 

a(u,v) = (f,v), Vu € Vo(D). (2) 

Let Tt be a regular partitioning of D into simplexes T with a mesh size h and let 
14(D) be the Pi -nonconforming finite element space of functions v G T 2 (D) [1] such 
that v\j are linear for all' T G Tj and v are continuous at the barycenters of T G Tt 
and vanish at the barycenters of the boundary faces on IV Note that the space 14(D) 
is not a subspace of H 1 ( D). 

Define the bilinear form on 14(D) by 

a h (u,v) = (k • Vu, Vv) r , Vu,uG 14(D), (3) 

Ter T 

where (-,-)x is the L 2 (T ) inner product, T G Tt. Then the Pi-nonconforming finite 
element discretization of (1) is to find Uh G 14 such that 

a h (u h ,v) = (f,v), Vu G 14(D). (4) 


Once a basis {^(x)}^ for 14(D) is chosen, (4) leads to a system of linear algebraic 
equations. Write u(x) = YliLi Ui<Pi(x). Then (4) becomes 


N 

Uia h (<pi , <pj) = (/, ipj), j = 1 , . . . , iV, 

t=i 

or in matrix representation 

Au = f, (5) 

where A j{ = a h (tpi, tpj), fj = (/, <?j), i,j = 

The first efficient solvers for nonconforming finite element approximations were 
proposed and investigated in [1] and [2]. Further developments can be found in [3], 
[4], and [5]. 

In this paper we will describe and analyze a method of constructing the precondi- 
tioner for (5) using an idea of algebraic substructuring (see [6] and [7]), which consists 
of the following main steps. 

First, we represent the matrix A from (5) in a 2 x 2 block form 

a _ Tin A 12 

A 21 A 22 ’ 

where An : IR^’ — > R^', i = 1,2, Ah + ./V 2 = N, in such a way that the block A 2 2 
is easily invertible. With the introduction of the Schur complement An = An — 
A 12 A 22 * A 2 i, the matrix A can be rewritten in the form 



An + Ai2A 22 1 A2i A 12 

A 21 A 22 


(7) 
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Then, we reconstruct the directed graph of the Schur complement A\\ in such 
a way that the resulting matrix S has the same kernel, is still positive definite (or 
positive semidefinite if the matrix A is singular), and is spectrally equivalent to the 
matrix An, i.e., 

cq(S\i , u) < (Aiiu, u) < ci(Su, u), Vu € 1R Ni , 

with constants Co and c\ independent of the mesh step size h and the jump of the 
coefficient k(x). 

To precondition the matrix S we make the same steps. That is, the matrix S is 
represented in a 2 x 2 block form 

r. _ Sn S 12 
~ [ S 21 S 22 ’ 

where Sa : IR^ 11 — ► IR^ 1 ', i = 1,2, Nu + N\ 2 = Ni, in such a way that block 
S 22 is easily invertible, so that Schur complement Su = Sii — Si 2 S 22 S 2 i is easily 
computable. 

Finally, following the ideas in [8], [9], and [10], we construct matrix spectrally 
equivalent to with constants 0 < d 0 < d,\ independent of the mesh size parameter 
h and the jump of the coefficient A:(x): 

do(SuV, v) < (S'nv,v) < di^nv, v), Vv e 1 R Nu . 

Then the matrix 

-Sll + SuS^Sn Si 2 1 , A A-1 A A 

n n + ^ 12^22 ^12 

021 022 

21 ^22 
is spectrally equivalent to the matrix A, i.e., 

r 0 (.£?u,u) < (Au,u) < ri(5u,u), Vu G IR N , 

where r 0 = min{l;co} • min{l;d 0 }, = max{l;ci} • max{l;di}. In the case of 
quasiuniform mesh and piecewise constant coefficient k(x), an algebraic multigrid 
method (AMG) [11], [4], [9], [10] can be used to construct such a matrix Sn. 

In other words the reconstruction of the directed graph of the matrix is equivalent 
to constructing the equivalent norm on finite dimensional space. An implementation 
of this approach depends on the structure of the graph of matrix A and, consequently, 
on the type of nonconforming finite element space 14. A detailed description of con- 
structing algebraic substructuring preconditioners for one concrete case of the Pi- 
nonconforming space 14 was given in [12], [13], and [14]. In all these papers authors 
defined the partitioning %, of the whole domain by subdividing it into topological par- 
allelepipeds and splitting each parallelepiped in turn into six tetrahedra. The present 
paper extends these results to the case of splitting each topological parallelepiped into 
five tetrahedra. 




( 8 ) 
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The explicit bounds of the spectrum of the preconditioned matrix are obtained 
with the help of the superelement analysis [12], [10], [7], [15]. 

The outline of the reminder of the paper is as follows. In Section 2 we consider 
a formulation of the model problem with piecewise coefficient &(x) when Cl is a unit 
cube. Then, in Section 3 we develop an algebraic substructuring preconditioner for 
the resulting linear system and give an implementation algorithm. In Section 4 we 
outline the algebraic multigrid method we use to precondition the Schur complement 
obtained in Section 3. Finally, the results of the numerical experiments and some 
conclusions are given in Section 5 to illustrate the theory being presented. 

2. PROBLEM FORMULATION 


To explain our approach we consider the model case when 0 is a unit cube in IR 3 , 
the boundary conditions are uniform, and k(x) is a piecewise constant function. Note 
that an extension of the method for the case of Cl being a union of parallelepipeds is 
straightforward . 

Let Ch = be a partition of Cl into uniform cubes with the length of the 

edge h — 1/n, where ( Xi,yj,Zk ) is the right back upper corner of the cube C^ ,k \ 
Next, divide each cube C^ ,k ^ into 5 tetrahedra as shown in Figure 1 and denote this 
partitioning of Cl into tetrahedra by Th . Note that we have two types of the partition- 
ing of the cubes C^’ k ^ into tetrahedra and the cube with one type of partitioning has 
all adjacent cubes of another type. Below we assume that function k(x) is a constant 
on each cube C £ Ch- 



FIGURE 1. Partition of cubes C^ ,k ^ into tetrahedra. 


We introduce the set of bary centers of all faces of the tetrahedral partition of Cl and 
the set Qh of those barycenters not on r 0 . The Crouzeix-Raviart Pi -non conforming 
finite element space 14 is defined by 


V h = {ve L 2 (Cl) : 


v\t € Pi(T), VT £ Th~, v is continuous at the barycenters 
from Qh and vanishes at the barycenters of faces on To}. 

(10) 


Let its dimension be N. Note that N « 10n 3 . 
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Now we define the bilinear form on 14 by 


a,h(u,v) = ^2 / k(x)Vu ■ Vu dx, Vu,wGl4- (11) 

TeT h 

Thus the nonconforming discretization of the problem (1) is given by seeking 
Uh £ 14 such that 

a h (u h ,v) = (f,v), V u G 14. (12) 

For any function Vh G 14 we denote by v G IR^ the corresponding vector of its degrees 
of freedom. 

Let (u,v)jv be a standard bilinear form defined on IR^ by (u,v)jv = Y2xeQ h «(x)i;(x), 
Vu, u G 14- Then the discretization operator 4 : IR^ — » IR^, which is symmetric and 
positive definite, is defined by 

(Au,v) N = a h (u,v), u,v e V h . (13) 

Similarly, we introduce the vector f by (/, v) = (f, v)jv, V v G 14- Now, problem 
(12) can be rewritten in a matrix form 

Au = f. (14) 

For each cube C = C^' hk ^ G (4, denote by the subspace of the restriction of 
the functions in 14 into C. For each v G V/f, we indicate by v c the corresponding 
vector. The dimension of is denoted by N c . Obviously, for a cube without faces 
on To we have N c = 16. 

The local stiffness matrix A c on a cube C G <4 is given by 

(A c u C) v4 c = J2 (^( x )Vu /l , Vu/ 1 )t, Vu h ,v h ev£. (15) 

TCC 

Note that matrices A c are positive definite when C D To 7^ 0 and semidefinite oth- 
erwise. The global stiffness matrix is determined by assembling the local stiffness 
matrices * 

(Au,v) n = (A c u c ,v c )jv c , Vu, v G IR n . (16) 

cec h 

3. ALGEBRAIC SUBSTRUCTURING PRECONDITIONER OVER A CUBE 

In this section we construct the algebraic substructuring preconditioner outlined 
in the Introduction. Toward the end of the section, we divide all unknowns in the 
system into two groups: 

1. The first group consists of 
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(a) one unknown per cube corresponding to the 1st face of those tetrahedra 
that are internal for each cube C € Ch (see Figure 2, face 1). 

(b) all unknowns corresponding to faces of the cubes in the partition Ch, with- 
out the faces on r 0 (Figure 2, faces 2, 3, , 13). 

2. The second group consists of the unknowns corresponding to the faces of the 
tetrahedra that are internal for each cube and that are not in the 1st group 
(these are unknowns on faces 14, 15, and 16 in Figure 2). 



FIGURE 2. Local enumeration of faces in a cube. 


The splitting of the space IR W induces the presentation of the vectors v T = 
( v i', v 2 T ), where Vi 6 IR^ 1 and V 2 € IR^ 2 , and V 2 corresponds to the unknowns 
of the second group. Obviously, N 2 = 3 n 3 and Ni = N — 3 n 3 . Then the matrix A 
can be presented in the following block form: 


A = 


A21 


A12 

. A 22 ’ 


(17) 


where An : IR^’ — > IR^', i = 1,2. 

Note that the matrix A 2 2 is block diagonal and can be inverted locally (cube by 
cube). Thus, Schur complement An = An — A 21 is easily computable. 

The local stiffness matrices on each cube also have the block form: 



All, c Ai2, c 
A21,c A22,c 


? 


(18) 


where A 22 ,c are 3x3 matrices. 

An important fact which is established by direct computations is that the matrix 
An can be obtained by assembling over all cubes local matrices An, c = An >c — 
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Ai2,cA 2 2, c A21,c'- 


(illU!,Vi) = J2 ^-^(^ll,cUl,c,Vi lC ), VUi,Vi G M Nl . 
cec h 1 

Here Ui >c is a restriction of u v into the nodes of the first group on the cube C G Ch , 
and for the cube C G Ch without faces on r 0 we have dim Ui, c = 13. 

Let us consider a cube C that has no face on the boundary r 0 and enumerate the 
faces Sj, j = 1, . . . , 16, as shown on Figure 2. Then the local matrices A,j )C , *, j = 1,2, 
of this cube have the following form: 

' 9/2 — r T 0 0 0' 

-r / r 9 — 1 -1 ' 

A n ,c= 0 / , A 22 , c =- -1 9 -1 , (19) 

0 I [ -1 “I 9 

0 / J 

' -1/2 000 -1 -1 -1 0 0 0 0 0 O' 

Ai 2 c = A 2 i, c = - 1/2 000 0 0 0 -1 -1 -1 0 0 0 , 

[ -1/2 000 0 0 0 0 0 0 -1 -1 -1. 

where r = [ 1 1 1 T , and I is 3 x 3 identical matrix. 

The local Schur complement matrix An iC for this cube has the form 

30 — 7r T — r T — r T — r T 
-7r 7/ 0 0 0 

i n , c =i. -r 0 T -R -R , (20) 

' -r 0 -R T -R 

-r 0 -R -R T 


where 



27 -8 -8 ' 
-8 27 -8 , 

-8 -8 27 



1 1 1 
1 1 1 


1 1 1 


Along with the matrix Au, c we introduce on each cube C G Ch the 13 x 13 matrices 

S c by 



and define N\ x Ni matrix S by assembling over all cubes local matrices S c : 

(S’u 1 ,v 1 )= ¥k(ScUi >c ,v liC ), Vui, Vl GlR Nl . 
cec h 1 
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It is easy to see that the matrices An,c and S c have the same kernel, i.e., kerAi ljC = 
ker5 c . 

We now consider an eigenvalue problem for p ^ 0: 

Ah >c ii = (J,S C u, u ^ 0, u G 1R 13 . (22) 


Direct calculations show that the eigenvalues of this problem belong to the interval 

\ftmini /^moi]) where firnin = 1/7 and /-I max = 1- 

Defining a new N c x N c matrix on each cube 



S c + Ai2,cA 2 2, c A21, 


121 , c 


Al2,c 
A22,c ’ 


we define the symmetric positive-definite N x N matrix B by 

(Bu,v)= Y. (B C u c ,v c ), 
c eCh 


(23) 

(24) 


where v, u € IR^, and u c and v c are their respective restrictions on the cube C. 

To estimate the condition number of the matrix B~ l A we use so called superele- 
ment analysis (see [16], [9], [17], [7]). Namely, it is easy to show the following inequal- 
ities: 


(Au, u) 

max 7— = max 

(Bu,u)?£0 (5 u,u) (Bu.uJjiO 


£ (A c u c ,u c ) 

C€C h 

E {B c U c ,u c ) 
cec h 


< 


max 

cec h 

(B^UcUcJjiO 


(A g u c ,u c ) 

(B c u c ,u e ) 


and 


(Au,u) 

mm 7— r 

(Bu,u)#o (5u,u) 


min 

(Bu,u)?£0 


E (A g u c ,u c ) 
cec h 

E (5 G u c ,u c ) 


> mm 
cec h 

(BC Uc,U c )*0 


(v4 g u e , u e ) 
(B c u„ u, : )' 


(25) 


(26) 


From the inequalities (25) and (26) we see that to estimate the condition number of 
B _1 A, it is sufficient to consider the local eigenvalue problems for /i c / 0 on each 
cube: 

A c u c = p c B°u c , u c ^ 0, u c € IR^. 

From (22) direct calculations show that the eigenvalues p c are within the interval 
[1/7,1]. Then the inequalities (25) and (26) yield: 

PROPOSITION 1. The eigenvalues of the problem Au = \Bu, u/0, belong to the 
interval { 1/7,1], and the condition number is thus estimated by cond (B~ X A) < 7. 

We stress that the condition number of the matrix B~ X A is bounded by a constant 
independent of the mesh step size h and the jump of the coefficient fc(x). 

Let us take the matrix B from (24) as a preconditioner for the matrix A. In the 
terms of the group partitioning introduced above it has the following block form 

g _ S + A12A22 A21 A \2 
Ai 1 A22 


(27) 
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As we noted earlier, the matrix A 22 is block-diagonal and can be inverted locally on 
cubes. So we concentrate on the linear system 

Sw = G, w,GeTR Nl . (28) 


The matrix S also can be represented in the block form 


S = 


Sn 

S 21 



(29) 


where the block S 22 corresponds to the nodes from the subgroup (b) of the first group, 
which are on the faces of cubes C <E Ch- From the definition of S, it can be seen that 
the matrix S 22 is diagonal. In the above partitioning, we present w and G in (29) in 
the form 


w = 



G = 


Gi 

G 2 ’ 


(30) 


where the dimension of vectors w a and Gi is obviously equal to M — dim w x = n 3 
and dim w 2 = Ni — n 3 . Then, after elimination of the second group of unknowns 
w 2 = S7 2 1 (G 2 — 5 2 iWi), we get the system of linear equations 


(5ii — 5'i 2 5 22 1 5 , 2 i)w 1 — Gi — 5 'i 2 S' 22 1 G 2 = Gi, (31) 


where the vector Wi and the block Sn correspond to the unknowns from the subgroup 
(a) of the first group, which have only one unknown per cube. 

Thus, if we define as above the Schur complement of matrix S by Su = Su — 
«S , i2‘5 , 22 1 ‘S' 2 i, matrix B can be presented in the form 


B = 


Sn + -S'i 2 S' 22 1 5 2 i 


221 


Si 2 
S 22 
A21 


+ A12A22A21 


A 12 
A22 


(32) 


where matrix A 22 is block diagonal and S 22 is diagonal and can be inverted locally 
cube-by-cube. Again, we have to stress that the condition number of the matrix B _1 A 
is bounded by the constant independent of the mesh step size h and the jump of the 
coefficient fc(x). The matrix B can be referred to as a three- level preconditioner. 

It is easy to see that the Schur complement Sa is a “7-point-scheme” matrix. In 
the next section we consider the solution techniques for problem (31) with the matrix 

C u . 


4. MULTILEVEL PRECONDITIONER OVER A CUBE 

While the preconditioner B has good properties, it is still not economical to invert 
it because the entries of the matrix Su depend on the jump of the coefficients. In- 
this section we propose a preconditioner for the matrix 5n provided that additional 
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assumptions on the behavior of the function k{x) are met and show that for this 
modification we can use any well-known multilevel procedure. 

Assumption (Al). Suppose that unit cube 0 can be represented as a union of a 
certain number m of pairwise disjoint cubes Gi, i = 1, . . . , m, with the size of edge H 
(H > 2 h) in such a way that in each cube Gi the function k(x) is a positive constant. 

_ m _ 

In other words, we set = (J Gi and k(x) = consti > 0, x E Gi, i = 1, . . . , m. 

i=i 

Now define on 0 an auxiliary parallelepipedal mesh Tc with vertices in the centers 
of cubes E Tc and in the centers of the boundary faces fl dfl. Let us 

consider a standard partitioning of Tc into tetrahedra Th and enumerate the nodes 
of this mesh in accordance with the enumeration of the cubes of Tc . 

Then define the piecewise constant function k(x) to be constant on each cube 
C^Tk) e f c by 

k(x) = min oi {fc(*+“bW+7) j ^ x <E C {i ’ 3 ’ k \ (33) 

and consider the boundary value problem 

-V-(jc Vu) = g in 0, u = 0 on r 0 . (34) 

Denote by Uh a usual (conforming) finite element space of all continuous piecewise 
linear functions on Th that vanish at the nodes of r 0 . Note that dim Uh = M. And, 
finally, define the symmetric positive definite matrix C by 

(C'u,v)a/ = J kVu-'Vvdx Vu,v£Uh, (35) 

Q 

where u, v are the vectors of degrees of freedom corresponding to the functions u and 
v, respectively. 

Consider an eigenvalue problem 

^nu = p,C u, u^O, u E IR M . (36) 

The following statement plays a very important role in all further arguments [15]. It 
can be established by straightforward computations. 

PROPOSITION 2. The eigenvalues of the problem (36) belong to the interval 
[1/2,1]- 

Now instead of the matrix (32) we define new matrix B by 

C + 5'i 2 S'^2 1 5'21 Sl2 1 . A-lA A 

n c + ^12^22 •'*■21 ^12 

021 ^>22 

A.21 A 22 

Then we can formulate the following theorem. 
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THEOREM 1. The matrix B defined in (37) with the block C defined in (35) is 
spectrally equivalent to the matrix A and cond(5 -1 y4) < 14. 

Thus, we have constructed a spectrally equivalent sparse preconditioner for the 
Schur complement after the elimination of almost 90% of the original unknowns. We 
note here that matrices A 22 and S 22 are block diagonal and, with B as a preconditioner 
for the matrix A , we have to develop procedure for solving the linear system of 
equataions 

Cu = G, u € IR M . (38) 

We have to stress that the function A’(x) is piecewise constant. Thus, any multi- 
level procedure which works well for such problems (34) can be used. 

We apply the preconditioned conjugate gradient method to solve the problem 
(13) with the matrix B from (37) as a preconditioner for the matrix A and use the 
multilevel domain decomposition method (MGDD) [9], [10], [15] to solve the problem 
(38) with matrix C\ we establish the following results. 

STATEMENT 1. If we use the MGDD method to solve problem (38) with the matrix 
C, then the condition number cond(i? _1 A) does not depend on mesh size h and the 
jump of the coefficient k(x). 

STATEMENT 2. The number of operations for solving the system Au = f by the 
preconditioned conjugate gradient method with preconditioner B and with accuracy e 
in the sense 

||u fe,+1 - u*|U < e||u° - u*\\ A , 

is estimated by C ■ N • In where u* = A _1 f, u° G IR^, and C does not depend on 
N and jump of the coefficient k(x). 

5. RESULTS OF THE NUMERICAL EXPERIMENTS 


In this section the preconditioner being considered is tested on the model problem 

-V-(fc(x)V«) = /, inO = [0,l] 3 , 

u = 0, on dD. 


In the numerical experiments presented we use the preconditioner B in the form 
(37). In this case by the Theorem 1 Cond 1 B~ l A < 14. The problem with matrix C 
is solved by the multilevel domain decomposition method, as described in [15]. 

The domain is divided into M = n 3 cubes (n in each direction) and each cube 
is partitioned into 5 tetrahedra. The dimension of the original algebraic system 
is N = 10n 3 — 6n 2 . The right hand side is generated randomly, and the accuracy 
parameter is taken as e = 10 -6 . The condition numbers of the preconditioned matrices 
B~ l A are calculated by the relation between the conjugate gradient and Lanczos 
algorithms. The coefficient k(x) is piecewise constant and is defined to be 


k(x,y,z) = 


k , (a;, y, z) € [0.5, 1] x [0.5, 1] x [0.5, 1] 

l , elsewhere 


(39) 
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The results are summarized in Table 1, where ni ter and cond denote the iteration 
number and condition number, respectively. All experiments are carried out on a 
Sun workstation. It takes approximately 25 minutes to solve the problem of the 
largest dimension N = 1 235 000. 

From Table 1 we see that the condition number does not depend on either the 
step mesh size h or the jump of the coefficient &(x). 


Table 1. Solving C by MGDD method 



20 x 20 x 20 
N = 17 600 

30 x 30 x 30 
N = 264 600 

40 x 40 x 40 
N = 630 400 

50 x 50 x 50 
N = 1 235 000 

k 

^iter 

cond 

^iter 

cond 

^■iter 

cond 

Ujter 

cond 

1 

14 

5.32 

14 

5.30 

14 

5.29 

14 

5.28 

10 

17 

6.59 

17 

6.53 

16 

6.37 

16 

6.29 

100 

17 

6.94 

17 

6.90 

16 

6.89 

16 

6.88 

1000 

17 

6.98 

16 

6.96 

16 

6.95 

16 

6.93 

10 4 

16 

6.98 

16 

6.96 

16 

5.95 

16 

6.94 

0.1 

16 

5.97 

16 

5.96 

16 

5.96 

15 

5.94 

0.01 

16 

6.02 

16 

6.02 

16 

6.00 

15 

5.97 

0.001 

16 

6.02 

16 

6.01 

16 

6.00 

15 

5.97 

10- 4 

16 

6.02 

16 

6.01 

16 

6.00 

15 

5.97 
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CONVERGENCE OF A SUBSTRUCTURING METHOD 
WITH LAGRANGE MULTIPLIERS 1 


Jan Mandel and Radek Tezaur 
Center for Computational Mathematics 
University of Colorado at Denver 
Denver, CO 


SUMMARY 

We analyze the convergence of a substructuring iterative method with Lagrange 
multipliers, proposed recently by Farhat and Roux. The method decomposes 
finite element discretization of an elliptic boundary value problem into Neumann 
problems on the subdomains and a coarse problem for the subdomain nullspace 
components. For linear conforming elements and preconditioning by the Dirichlet 
problems on the subdomains, we prove the asymptotic bound on the condition number 
(7(1 + log (R/h)) 7 , 7 = 2 or 3, where h is the characteristic element size and H is the 
subdomain size. 


INTRODUCTION 


We analyze the convergence of a substructuring method with Lagrange multipliers, 
proposed by Farhat and Roux [11] under the name Finite Element Tearing and 
Interconnecting (FETI) method. The main idea of the FETI method is to decompose 
the problem domain into nonoverlapping subdomains and to enforce continuity on 
subdomain interfaces by Lagrange multipliers. Eliminating the subdomain variables 
yields a dual problem for the Lagrange multipliers, which is solved by preconditioned 
conjugate gradients. This idea is related to the fictitious domain method where the 
Lagrange multipliers enforce boundary conditions as in Dinh et al. [5]. 

Elimination of the subdomain variables is implemented by solving Neumann 
problems on all subdomains in every iteration, which can be done completely in 
parallel. However, the subdomain problems are singular, so a small auxiliary problem 
for the nullspace components of the subdomain solutions needs to be solved in every 
iteration. This is an added complication, but also a blessing. Farhat, Mandel, and 
Roux [10] have shown numerically and have proved for the FETI method without 
preconditioning that the auxiliary problem plays the role of a coarse problem, namely, 
it causes the condition number to be bounded independently of the number of 

1 This research was supported by the National Science Foundation under grants ASC-9217394 
and ASC-9121431. This paper has been submitted for journal publication elsewhere. 
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subdomains. The method was further extended to time-dependent, problems, which 
lack the naturally occurring coarse problem, by Farhat. Chen, and Mandel [9]. 

In this paper, we show that the condition number of the preconditioned 
FETI method is bounded independently of the number of subdomains and 
polylogarithmically in terms of subdomain size, as is the case for other optimal 
nonoverlapping domain decomposition methods [3. 6. S. 16. 17]. We refer to [10] 
for numerical results that confirm the theory and for parallel implementation and 
performance. 

The FETI method is in a sense dual to the Xeumann-Neumann method 
with a coarse problem, developed by Mandel under the name Balancing Domain 
Decomposition [15] based on an earlier method of de Roeck and LeTallec [4]. A 
modified method was analyzed by Dryja and Widlund [S]. 

Analysis of domain decomposition methods typically proceeds by demonstrating 
spectral equivalence of the quadratic form that defines the problem in a variational 
setting and the quadratic form that defines the preconditioner, often by way of the 
P. L. Lions lemma [1. 6. 7. 14]. Since the preconditioner in the FETI method is quite 
complicated and is not defined in terms of a quadratic form, we proceed differently and 
find a bound on the norm of the product of the system operator and the preconditioner 
to bound the maximal eigenvalue, as well as a bound on the inverse to bound the 
minimal eigenvalue. Related analyses were previously done for methods without 
crosspoints between the subdomains, or done formally in functional spaces (cf., for 
example, Glowinski and Wheeler [12]). In this paper, we present a. complete analysis 
in terms of upper and lower bounds on the preconditioned operator for decompositions 
with crosspoints in 2D and edges and crosspoints in 3D. 

FORMULATION OF THE METHOD 

In this section, we briefly review formulation of the FETI method according to [10], 
where one can find more details about the algorithmic side. At the same time, we 
introduce the spaces and operators that will be used in our analysis. 

We consider iterative solution of a system of linear equations Lx = b arising from 
a finite element discretization of an elliptic boundary value problem on a bounded 
domain D, which is decomposed into nonoverlapping subdomains f],-, i = 1,. . . , n 5 . 
The matrix A is assumed to be symmetric and positive definite. Let 

Wi = WW) (1) 

be the space of local vectors of degrees of freedom associated with the boundary of 
fl,-, and let 

y = vuU 5ft i) ( 2 ) 

i=i 

be the space of global vectors of degrees of freedom associated with all subdomain 
boundaries. The correspondence of the local and global vectors of degrees of freedom 
is given by zero-one matrices Nj : IT, — > Y . 
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We find it convenient to identify vectors of degrees of freedom, which are in some 
spaces IR n , with the associated finite element functions. Operators between the spaces 
are represented as matrices, and we frequently commit an abuse of notations by using 
matrices and operators interchangeably. The l 2 inner product is denoted by (•, •) on 
all spaces. The associated norm is ||it|| 2 = (u,u). The transpose of a matrix M is 
denoted by M' . 

After elimination of the interior degrees of freedom in all subdomains 0, , we obtain 
the reduced system of linear equations for the vectors Wi £ W % of degrees of freedom 
on subdomain boundaries, which we write in subassembly form as 


f^NiSw = / 

i = l 

n s 

( 3 ) 

£ B t w t = 0 
1=1 

( 4 ) 


Here, Si are the Schur complements of the subdomain stiffness matrices obtained 
by elimination of the interior degrees of freedom, and B{ are matrices with entries 
0,1,— 1 such that (4) expresses the continuity of the solution between subdomains, 
that is, the requirement that the values of degrees of freedom common to more than 
one subdomain coincide. 

To describe the method in a concise form, we need to define the following spaces. 
IT is a space of all boundary degrees of freedom on all subdomains: 

W = (g) Wi (5) 

i=l 

X is a space of vectors with entries corresponding to pairs of degrees of freedom on 
the interfaces where we enforce continuity: 

x c (g) I4(da-nd%). (6) 

dQiC\d£lj^:$ 


Denote the block matrix 


B:W ^X = {B u ...,B ns ) (7) 

and the space of Lagrange multipliers 

U = Range B. (8) 

These are the details we need for the purpose of describing the method. A more 

specific description of B will be given in the next section. Finally, denote the 
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symmetric block diagonal matrix 


j s x 0 

0 S 2 0 


5 : W W, S = 


V o 


0 \ 


• 0 

0 s n . J 


(9) 


The problem (3) and (4) can now be written as the minimization of total 
subdomain energy subject to the continuity condition: 

S(w) = g (Sw,w) + (/, w) — > min, subject to w G W, Bw = 0. (10) 

Writing the Lagrangean of this minimization problem 

C(w,\) = ^(Siu,w) + ( f,w ) + (\,Bw), to eW, A <E U, 
we solve the dual problem 


max inf C(w, A) = maxC(A) 
\eu wew y \eu v ' 


( 11 ) 


By a direct computation, 


C(A) 


— oo if (/, w) + (A, Bw) ^ 0 for some w 6 Ker S, 
-\(S+{f - B'\)J - B'X) otherwise, 


( 12 ) 


where S + : W — >• W is any pseudoinverse of S , i.e., an operator such that w = SP~g 
solves Sw = g if g J_ Ker S. It is easy to see from (12) that the choice of S + does 
not change the value of C. Without loss of generality, assume that S + is given by the 
spectral decomposition 

S + = J2j v t v t, (13) 

t > 0 T 

where 

S = ^2tv t v' t , Sv t = tv t , v' t v t = l. (14) 


The dual problem (11) is equivalent to maximizing C( A) on the admissible set 


A={XeU\ C( A) > -oo}. 

Define the space of admissible increments 

V = { Ai — A 2 | Ai g A, A 2 g -4} 

= {{i G U | (fj,,Bw) = 0 Vrc G Kerf?}. 


(15) 
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At the maximum of C(A), A E A, the derivative of C is zero in all directions in V : 

DC{ A; ?) = 0 V/iGV. 

By a straightforward computation, this becomes 

AeA, (-BS+B 1 A + BS+f, ?) =0, \/fi e V. (16) 

In order to express (16) as a linear equation in the space V, let Py '■ U V be the 
projection onto V orthogonal in the l 2 inner product Then for ? G V, 

(-BS+B'X + BS + f , ?) = (-BS+B 1 A + BS + f, Py?) = { Py(-BS + B’X + BS+f),?) 

since Py is orthogonal, so Py = Py. Therefore, the dual problem (11) is equivalent 
to the linear equation in V for the unknown ?: 

?eV, Py{-BS + B , (fi + X o ) + BS + f)=0, (17) 

where A 0 is an arbitrary starting feasible solution, i.e., A) € A. 

The FETI method is the method of preconditioned conjugate gradients in the 
space V applied to the linear equation (17). The linear part of the operator in (17) 
is PyF, where 

F = BS+B’. (18) 

We consider the preconditioner PyM , where 

M = MSA, A = \b'. (19) 

That is, in each iteration of the preconditioned conjugate gradients algorithm, 
z = PyMr is evaluated as an approximate solution of the residual equation PyFz = r. 
The prqconditioner (19) was proposed in [10]. Note that the evaluation of the 
matrix- vector product Su can be implemented by solving a Dirichlet problem in each 
subdomain; therefore it is called the Dirichlet preconditioner in [10]. 

ANALYSIS 


A well known bound on the reduction of the error in k iterations of the method 
of preconditioned conjugate gradients in the norm |||e||| = {PyFe^e) 1 / 2 on V is [13] 

jvz-a* 


where k is the condition number 


y y/~K + 1 ) ’ 

Amax ( Pv FPy M | V ) 


^mm(FyFPyM\v) 

and A max and Amin are the maximum and minimum eigenvalues of operators on V. 


(20) 
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Abstract Framework 


The main idea of our convergence analysis is summarized in the following lemma, 
which we will apply to F and M from (IS) and (19). 

Lemma 1 Let L be a finite dimensional linear space with the inner product (•,•). 
Let F be a subspace of V. || • ||\- be a norm on F induced by an inner product, and. 
the dual norm be defined by ||(’||r' = sup- er (v. c)/||i’||r- Let Py : U — > V be the (•, •) 
orthogonal projection onto F. and F. M : U — > F linear operators symmetric on V, 

(A, FA) = (A. FA) VA.AgF 
(i'.Uv) = (v.Mv) Vv.be W 

and such that 

c i II A ||f '/ < (A. FA) < c 2 ||A||f, VA€ V 

C 3 II Ilf' < {v.Mij < c 4 ||i’||f Vr e F 

with constants C 1 .C 2 .C 3 .C 4 > 0. Then 

^max (PyMPyF) < C 2 C 4 

^ A min (PvMPvF) - Cl c 3 - 

Proof. Since A € F, we can replace in (21) F by PyF. From (21), the operator 
norm of the mapping PyF : V — > V and its inverse satisfies 

||TVF]|r'-u' < c 2 , ||(FrF)“ 1 || r _r' < (24) 

^1 

Similarly, (22) implies 

\\P v M\\y^y, < c 4 , ||(iVM)- 1 ||v/-v < (25) 

C 3 

Consequently, 

A m ax(A'd/FyF) < \\PvMPvF\\v'^V' < c 2 c 4 

and 

A ma fi{PvF)-\P v My l ) < ||(FvF)- 1 (F 1 / M)- 1 )||v^ < , 

cic 3 

which gives (23). □ 

The rest of this paper is concerned with estimating the condition number k 
from (23). We will specify a suitable norm || • ||y and estimate the constants in 
Lemma 1 for the finite element problem below. 


( 21 ) 

(22) 

(23) 
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Assumptions 


We need more specific assumptions in order to be able to prove a bound on the 
condition number k. So, we are solving the boundary value problem 

■Au = g in f2, u — 0 on <90 

where 



with a(x ) a measurable function such that 0 < cto < a(x) < a\ a.e. in f l. 

The domain f l is assumed to be divided into nonoverlapping subdomains f 2,-, 
i = l,...,n s , which can be generated from a reference domain (square or cube) 0 
of unit diameter as 0, = FfCli) by mappings Fj, which are assumed to satisfy 

ll&FiH < CH, WdF-'W < CH~ X 

with the Jacobian dF{ and the Euclidean IR d matrix norm ||.||. That is, the 
subdomains are shape-regular and have a diameter of 0(H). 

Assume that 14(0) is a conforming PI or Ql finite element space on a triangulation 
of 0, which satisfies the standard regularity and inverse assumptions. Denote by h 
the characteristic element size. Each subdomain f 1, is assumed to be a union of some 
of the elements, and all functions in 14(D) are zero on dfl. 

In particular, the degrees of freedom are values at nodes of the triangulation. We 
assume that B is defined as follows. For a pair of degrees of freedom w r (x a ) on <9 0 r 
and w s (x a ) on <90 s , such that the node x a does not belong to any other subdomain, 
let 

(Bw) rs (x a ) = cr rs (w r (x a ) - u; s (:r tt )), (26) 

where a rs = 1 or a rs = — 1. 

When node xp belongs to more than two subdomains = Si,S2, • • • ,s n?) we 

assume that (Bw) TS (xp) is defined so that B is full rank and so that the coefficients 
are ±1 and determined uniquely by the indices (si, s 2 , • • • , s n /3 ). For example, 

(Bw) kik+1 (xp) = (-l) k w Sk (xp) - (~l) k w Sk+1 (xp), k = l,..,np-l. (27) 

For an example of the definition of the values of B from (27) with (si, s 2 , 53) = (1, 3, 2) 
in 2D around a crosspoint, see Fig. 1. 

Remark 2 The essential property here is that there are no redundant constraints in 
enforcing the continuity of the solution at the nodes where more than two subdomains 
meet and that the constraints do not change along the edges (in 3D). Only the improved 
estimate in statement 3 of Lemma 8 will require the specific definition (27). 
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U! 3 (x a ) W 3 (X0) m(x y ) 


03 

Figure 1: Definition of B 
Discrete Norm Bounds 


The key to our analysis is a proper choice of norms. We equip the space W with 
the seminorm and the norm 




IIHItv - I w \w + 77 11, Iloilo, dn,i 


and the space V with the norm || ■ ||y and the dual norm || • ||y/ 


\\ v \\v = sup jpip. v e V. 
rev Mr 


For the definition and properties of the Sobolev seminorms | • l^o, see, e.g., [18]. The 
space U is identified. with some space JR n . We use the l 2 inner product (•, •) as duality 
pairing. 

In the following, we use a ~ b to indicate that ca < b < Ca with some positive 
generic constants c and C independent of the characteristic mesh size h and the 
subdomain diameter H. First we need to relate our discrete norm to a Sobolev norm 
and to establish equivalence of the norm and seminorm on the complement of the 
kernel of S. 


Lemma 3 |u>|j r ~ { w.Sw ). w € W . 


510 



Proof. The lemma follows from the standard result [2, 19] 

\ w i\H 1 / 2 (dQi) ~ ( w iiSiWi) 

by summation over all subdomains 0,- and using (28). □ 


Lemma 4 \w\w ~ ||iw||w, w € W, w J_ Ker S. 

Proof. From the equivalence of the H 1 norm and seminorm on the factorspace modulo 
constants [18] or from the Poincare inequality, and scaling from a reference domain 
to subdomain 0;, 

Iloilo, an,- — CH\ Wl \l /2tdQt 

for all W{ if dfli contains a part of dfl , and for all W{ such that f an . W{ = 0 otherwise. 
The lemma follows by summation over the subdomains and from (28). □ 


We also need the equivalence of the norm ||.Au||w and the seminorm |Au|vk- 
Lemma 5 |Au|^ ~ |[Au||w, v £ V . 

Proof. Let v £ V. Since A — | B', by definition of V, we have (Av,w) = 0 Vra € 
Ker S or Av T Ker S, which yields the result using Lemma 4. □ 

Our norm on V was chosen so that the preconditioner is coercive and bounded, 
i.e., so that (22) holds with c\ and c-i independent of H and h. This is shown in the 
following lemma. 

Lemma 6 (v,Mv) |M|y, Vu € V, 

Proof. For v € V, by definition of the preconditioner M, Lemma 3 and Lemma 5, 

(v,Mv) = (v,A'SAv) = ( Av,SAv ) ~ ||n||y 
Q 


The following lemmas lead to estimates of coercivity and ellipticity of F. We 
first summarize some well known results and inequalities in a form suitable for our 
purposes. 

Lemma 7 Let G be a vertex, edge, or face (if d = 3) of subdomain fl,-. A face is 
understood not to contain adjacent edges, and an edge does not contain its endpoints. 
For z G Vh(drti), define w G Vh(dfli) by w(x) = z(x) on all nodes x £ G; w(x) = 0 
on all other nodes of d£l{. Then, 

IMUri/qgn,) — ^(1 + ^°S (W z \\ 2 H 1 / 2 (dQ i ) + ~ff\\ z \\ 2 L 2 (9n i ))'> 
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where 


f3 = 1 if d = 2 and G is a vertex, or d ' = 3 and G is an edge or a vertex 
(3 = 2 if d = 2 one? G on edge, or d = 3 one? G is a /ace. 


Proof. The inequality for d = 2 was proved in [16, 17]. The case when d = 3 follows 
from Lemmas 4.1 and 4.2 in [3] if G is an edge or a vertex and Lemma 4.3 in [3] if G 
is a face (cf. [6] ).□ 


Lemma 8 It holds that 

inf |H|? r < G(1 + log(tf//0)lALM|? r , re € W, 

u»€ vv 
Bw= Bw 

where a = 1, and. a — 0 in the following special cases: 

1. BA = I, which means that there are no nodes shared by more than two 
subdomains. 

2. d = 2 and the matrix A has the following property: If w € Range A, x is a 
crosspoint (node shared by more than two subdomains), and wfix) = wfiy ) for 
all i such that x £ <9fh and. all nodes y that are adjacent to x on dfli, then 
W{(x) = 0 for all i such that x € <9f L;. 

3. d = 2, B is defined, by (26) and (27), and all nodes in the triangulation belong 
to either one, two, or an odd number of subdomains. 

Proof. Let us first prove that in the general case we obtain a < 1. Let w £ W and 
u = Bw throughout this proof. From the fact that BAfBA^u = u, and by the 
triangle inequality, 

inf ||tw||w < \\A(B A)~ l u\\w < ||.Au||w + || A(I — (B A)~ l )u^w ■ (30) 

wQW 

Bw=u 

Denote z = A(I — (BA)~ l )u. From the definition of B in (26), 2 is zero at all nodes 
that belong to at most two subdomains. The remaining nodes lie on crosspoints or 
edges (in the 3D case) of subdomains. From the definition of B, at every such node 
x, zfix) is a linear combination of the entries of Au that correspond to the same node 
x , and the coefficients of the linear combinations are bounded only in terms of the 
number of subdomains to which the node belongs. Using Lemma 7 for the crosspoints 
of subdomains, we obtain for the 2D case that 

II A(I-(BA)->\\w<C E ((A«),(x)y < C(l+H(.H/h))\\Auf w . (31) 

x crosspoint, x£dQi 
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In the 3D case, the argument for subdomain crosspoints is the same. In addition, we 
note that the coefficients of the linear combination do not change along a subdomain 
edge, so it remains to apply Lemma 7 on every edge. 

Let us now turn to the special cases that give a = 0. 

If BA = 7, we choose w = Au in the following and get 

inf^ ||tu||w < ||Atij|w as B{Au) = u, 

Bw=u 

which proves the special case 1. 

Now we prove special case 2. From the definition of the 77 1 / 2 norm [18] and the 
fact that Au is a piecewise linear function, it follows that 

WMw ^ X \ Au \ ?/2,8fi i ^ X ((Au) f (x) - ( Au)i(y )) 2 . (32) 

1 x crosspoint , x gSfl,- 

y adjacent to x,3/e<9fij 

For any crosspoint x, it follows from the assumption that for every w £ Range A, 

X (wi(x) - Wi{y)f = 0 =>• ]T (^'( a; )) 2 = °- 

• i,dUiBx 

y adjacent to x, y€.d£l{ 

Consequently, by compactness, and since there are only finitely many different 
numbers of subdomains sharing a crosspoint, 

X X (&i(z) - Wi(y)f , Vu) 6 Range A. 

i,dtti3x . i.BdiSx 

y adjacent to x, y€dCli 

By summation over all crosspoints x and using (31) and (32), we get 

\\A{I-{BA)- l )u\\ 2 w < C\\Au\\ 2 w , 
which concludes the proof of this case. 

In order to prove case 3, we verify the assumptions of case 2. We formulate 
only the proof for a crosspoint shared by three subdomains (Fig. 1). The proof 
is similar for a different odd number of subdomains. Let w £ Range A. Since 
w\{xp) — w\(x a ) = 0 and w\{xp) - W\(xs) = 0, we have uh(o: a ) = 101(0:5). Simi- 
larly, we obtain u) 2 (x a ) = w 2 (x~,) and u> 3 (xg) = w 3 ( x 7 ). Now w £ Range A implies 
toi(o:a) = — W 2 (x a ), w 2 (x-y) = —w 3 {xfi), and w 3 (xs) = —wi(xs), which can be satisfied 
only if wi(x a ) = 171(0:5) = ... = 0. 0 

Remark 9 In general, the exponent a = 1 in Lemma 8 cannot be improved. To see 
that, let us consider the configuration with values of u and Au in the neighborhood 
of a crosspoint as in Fig. 2; these values violate the assumptions of special case 2. 
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Figure 2: Counterexample. 

Extending the values of u in Fig. 2 to decay as log ^(t/H), 7 < 1/2, 
distance from the crosspoint, we obtain a function u £ U such that 

|| Au||iy ~ C , Il u llfl-i/ 2 (an 1 n9n 2 ) ~ I 1°S V H-V • 

If u = Bw, then on < 90 i fl dVt2, u = w 2 — wi, we obtain 

I w l// 1 / 2 (an 1 n9n 2 ) - K w ilff 1 / 2 (af2inan 2 ) + \ w 2\H 1 / 2 (dQ 1 ndQ 2 ) 

< \Wl\ H l/2^ dQl ) + \w 2 \ H l/2^ d Q 2 ) 

< IM|w 

so infBu, =u |Mlw > C(-y)\\ogh/ for all 7 < 1/2. 


Lemma 10 Let A E V . Then for all w E W, there is a w E 
ABw _L Ker S and 


(A, Bw ) 2 

Halin’ 


< C(l + log H/h ) 2 


(A, Bw ) 2 

\ABw\\ 2 w 


Proof. Let w E W be arbitrary, and put w = w + z where z E 
A-E V, we have 

(A, Bw) = (A, Bui). 

We would like to have ABw _L Ker S, which can be also written as 

{Bz,Bz) = ~{Bw,Bz) VzeKexS. 


where t is the 


W such that 


Ker S. Since 
( 33 ) 
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The bilinear form (B-,B-) is an inner product on the factorspace Ker 5'/ (Ker S n 
Ker B ), so by Riesz representation theorem we may conclude that there exists 
z G Ker S satisfying \\Bz\\ < ||5iw||. 

Now, from the definition of B and the norm in W, we obtain 

_ | \Bw\\ 2 < C\\wf < CH\\wf w . 

Also, since z G Ker S, it is constant on each <90;, and we have the following by 
Lemma 7 

\\ABzf w < C/H\\Bz\\ 2 (l + log H/h) 2 . 

Together this yields 

\\ABz\\ 2 w < C(l + \ogH/h) 2 \\w\\ 2 w . 

By the definition of A and B , (ABwfi on 50; U dftj is a linear combination (with 
bounded coefficients) of (a bounded number of) Wk from all dflk adjacent to 50;U50j. 
From Lemma 7, 


< £7(1 + log(H / h))\\w\\w , Vu; G W. 

Finally, summarizing, 

||A5ui||pv ||ABi«||w T ||A5z||^y < £7(1 -f log L7//i)||i0||jv. 

From this and (33), the result follows. □ 

We have now everything ready to prove the estimate (21). 

Lemma 11 c(l + log( J ff//i)) _a ||A||^, < (A, FA) < £7(1 + log(97//i)) 2 ||A||^, ; VA G V , 
with a defined in Lemma 8. 

Proof. From the spectral decomposition (14), define S' -1 / 2 = Then 

5+ = S'- 1 / 2 5- 1 / 2 , and for A G V, 

(A, FA) = (S + B'X,B'X) = {S- 1/2 B'X,S- 1/2 B'X) 

i, c-i/2 D / w ,2 (S-^B'X, x) 2 (B'X, S~^ 2 x) 2 

= IIS' 1/2 F'A|| 2 = sup A ’ = sup \ J 

X€W ® r «=«]+.! Fi + x 2 r 

xi €EKer S a?2-*-Ker & 

{B i X,S~ 1 / 2 x 2 ) 2 
= SU P n — iii 

x 2 €W, x 2 ±Ker S |F2|| 

since S~ l ! 2 xi = 0 and ||a:|| 2 = ||ari || 2 + ||x 2 || 2 . Now write any w G W as 
w = wi + w 2 , wi G Ker S, w 2 = S~ 1 ^ 2 x 2 ± Ker S. 

From the definition of V in (15), A G V implies that 

(B'X,wi) = 0. 
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Since 


lk 2 || 2 = (.r 2 . J 2 ) = (n’ 2 .Sw 2 ) « |tc 2 |tr ~ ||ir 2 ||rr 
from Lemma 3 and Lemma 4, it follows that 


(A, FA) 


(B'X, w 2 ) 2 

= , su p . i — c — r ~ su p 

it'oGU. u' 2 iKer 5 \ ^2 ^ ^ ^2/ u'GH’ 


(A, Bio) 2 

IHIfr 


Lemma 8 shows that 


sup 

we\v 


(A, Bit’) 2 


10 


2 

ir 


> 


(A, Bio) 2 ^ 1 (A ,Bw) 2 

3infs,=Bu.|MI?r ~ C(1 + log H/h) a \\ABw\\ 2 w 

1 (A ,Bw) 2 

C(1 + log H/h) a S,?- HABu.'H?/ 

-•UBu'-LKer S 


Lemma 10 yields an upper bound 


sup 

wew 


(A, Bi 


w 


1 2 
hr 


< C(1 + log H/h) 2 sup 

ti'gW 

dBu’iKer 


(A, Bw) 2 
\ABw\\ 2 w 


Finally, by definition of the norm || • ||r/. 


sup 

u’Gir 

i4Sty±Ker 5 


(A, Bw) _ (A, v) 

ABw\\w 


A Ik' 


since B spans V. □ 


Condition Number Estimate 


The final result now follows from the abstract estimate in Lemma 1 with the 
assumptions verified by Lemma 6 and Lemma 11. 

Theorem 12 The condition number of the FETI method with the Dirichlet 
preconditioner satisfies 


A Tnax{PvMPyF) 
A m in{PvMPvF) 


<c(i+io g |r 


with 7 = 3, and 7 = 2 in the special cases listed in Lemma 8. 
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A Systematic Solution Approach for Neutron 
Transport Problems in Diffusive Regimes * 

T. A. Manteuffelt K. J. Ressel* 


SUMMARY 


A systematic solution approach for the neutron transport equation, based on a 
least-squares finite-element discretization, is presented. This approach includes the 
theory for the existence and uniqueness of the analytical as well as of the discrete 
solution, bounds for the discretization error, and guidance for the development of an 
efficient multigrid solver for the resulting discrete problem. To guarantee the accuracy 
of the discrete solution for diffusive regimes, a scaling transformation is applied to 
the transport operator prior to the discretization. The key result is the proof of the 
U-ellipticity and continuity of the scaled least-squares bilinear form with constants 
that are independent of the total cross section and the absorption cross section. For 
a variety of least-squares finite-element discretizations this leads to error bounds 
that remain valid in diffusive regimes. Moreover, for problems in slab geometry a 
full multigrid solver is presented with V(l, l)-cycle convergence rates approximately 
equal to 0.1, independent of the size of the total cross section and the absorption 
cross section. 


1. INTRODUCTION 


The deterministic numerical solution of neutron transport problems becomes hard 
in diffusive regimes, which are characterized by very large total cross sections and very 

*This work was supported by the DOE under grant DE-FG03-93ER25165 and the NSF under 
grant DMS-8704169. 

^Program in Applied Mathematics, University of Colorado at Boulder, CB 526, Boulder, CO 
80309-0526 (tmanteuf Ssobolev . Colorado . edu). 

^Interdisciplinary Project Center for Supercomputing (IPS), Clausiusstrafie 59, RZ F-ll, ETHZ, 
CH-8092 Zurich, Switzerland (kjr@ips.id.ethz.ch). 
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small absorption cross sections. In these regimes the transport equation is nearly sin- 
gular and its solution in the interior of the computational domain is close to the 
solution of a diffusion equation. In order to solve diffusive transport problems nu- 
merically, it is advantageous to use a discretization for the transport operator that 
resembles a good approximation of a diffusion operator in diffusive regimes. In the 
past, special discretizations for transport problems in slab geometry have been de- 
veloped that have this property. Among them are the Diamond Difference scheme 
(Lewis and Miller [16]), the Linear Discontinuous scheme (Alcouffe et al. [2]) and 
the Modified Linear Discontinuous scheme (Larsen and Morel [15]). However, these 
discretizations have the disadvantage that either the solution of the resulting discrete 
system (Manteuffel et al. [17] [18]) or their extension to higher dimensions is difficult. 

In this paper we present a general framework for constructing discretizations of 
transport problems that are accurate in diffusive regimes. This framework, which 
is based on a least-squares variational formulation in combination with a scaling 
transformation, represents a systematic solution approach since it includes the theory 
for the existence and uniqueness of the analytical, as well as of the discrete, solution, 
bounds for the discretization error, and guidance for the development of an efficient 
multigrid solver for the resulting discrete problem. 

To introduce our notation we recall that the single group, steady state, isotropic 
form of the neutron transport equation is given by (Lewis and Miller [16]) 

f [0 • Y + — <J S P] 0) = q(l, 0) for (r, fi) € 7Z x S 1 

\ ip{r, 0) = g(,L, 0) for r € dTZ A n(r) • Q < 0 

where o t is the total cross section, a s is the scattering cross section, and ip(r, O) is the 
angular flux , to be determined for all points r = (x, y, z) in a region 71 C IR 3 with a 
sufficiently smooth boundary (for example of class C 1,1 (Grisvard [10, p. 5]) and all 
possible travel directions O on the unit sphere S 1 ). The operator P is defined by 

Pip(r,£L) :=-?-/ ^(r,Q') dfi', (1.2) 

47T J 
s 1 

which is an L 2 -projection onto the space of functions that are independent of direction 
angle Q. The boundary conditions specify the inflow of particles into the region 71, 
since n(r) denotes the unit outgoing normal at r G dTZ. Such problems arise as the 
inner loop of time-dependent, multienergy-group problems. 

In the case of slab geometry it is assumed that |^ = |^ = 0, so that ip(r, Q) = 
tp(z, fl) with (i := cos (6), where 9 denotes the angle between 0 and the 2- axis. Equa- 
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tion (1.1) reduces then to [16] 


d 

Vt- + cr t I 
oz 


<7,P 






= q(z, p) for ( z , p) E [z u z r ] x [-1, 1] 

= gi(fj.) forp>0 

= g r {p) for p < 0 


(1.3) 


Now, the operator P is defined by 



-l 


(1.4) 


which is an L 2 -projection onto the space of all functions that are independent of p. 

Without loss of generality, we assume in the following vacuum boundary conditions 
(g(r,Q) = 0 in (1.1) and gi(p) = g r ( p) = 0 in (1.3), respectively) and further that 
diam(7£) = 1 in (1.1) and \z r — zi\ = 1 in (1.3), respectively. Both assumptions can 
be established by a simple transformation. 

When a t — > oo and ^ — >■ 1, equations (1.1) and (1.3) become singular. Dividing 
(1.1) or (1.3) by a t results in the limit equation (/ — P)ip = 0. Therefore, the limit 
solution is independent of direction angle 0 and /i, respectively. Moreover, when 
a t — > oo and ^ 1 in a certain way, which is called the diffusion limit , it can 

be shown (Larsen [13]) that the limit solution converges to a solution of a diffusion 
equation. To be more specific, we introduce the absorption cross section a a := a t — a s 
and a small parameter e. The diffusion limit can then be defined as the limit e — >■ 0 
after scaling the cross sections and the source in the following way: 


q(r,Q) ^(zi,0), o t —> -, &a ea, 


(1.5) 


where a is assumed to be 0(1). In this parameterization the transport equation 
becomes 


£tp(r, D) := 


0 - V + -(/- P)+eaP il>{r,Q)=eq(r,n). 


(1.6) 


Using an asymptotic expansion in e it can be proven (Larsen [13], Pomraning [24]) 
that the solution of (1.6) has the diffusion expansion 


'/'(r. Q) = </> a (r) +£fe(r,2), 


(1.7) 


where (j ) o is, at a few mean free paths away from the boundary, a solution of the 
diffusion equation 

-V • ^V^o(r) + a^ Q (r) = Pq{r,tt). (1.8) 

For the following analysis of a least-squares finite-element discretization of the trans- 
port equation (1.1) we use the form of the transport operator in (1.6). 


521 



This paper is organized as follows. In Section 2, we describe briefly the least- 
squares finite-element discretization. Further, we introduce and motivate in this 
section a scaling transformation that is applied to the transport operator prior to 
the discretization in order to ensure the accuracy of the discrete solution for diffusive 
regimes. In Section 3, we state that the scaled least-squares bilinear form is continuous 
and -elliptic in a certain norm with constants independent of e and a. The existence 
and uniqueness of the analytical, as well as of the discrete, problem then follows 
directly from the Lax-Milgram Lemma [7]. 

Furthermore, the continuity and the R-ellipticity, in combination with Cea’s 
Lemma [7], are the basis for discretization error bounds that are established in Sec- 
tion 4 for a variety of conforming finite-element spaces. Since the continuity and the 
R-ellipticity constants are independent of e and a, these error bounds remain valid 
for diffusive regimes. Thus, the least-squares discretization of the scaled transport 
equation with simple conforming finite-elements yields an accurate discrete solution, 
even in diffusive regimes. In Section 5, we describe a full multigrid solver for problems 
in slab geometry and present some convergence rates. Finally, in Section 6 we draw 
some conclusions. 


2. SCALING TRANSFORMATION 


Let us denote the standard inner product and associated norm of L 2 (TZ x S' 1 ) by 

(u,v) := J Ju-v*d£ldr ; IMI := \J ( u -> u ) Vu, v E L 2 (TZ x S 1 ), (2.1) 

n s 1 


where v* is the complex conjugate 1 of v. Further, let V be a Hilbert space with 
underlying norm ||-||y, which we will specify later. Then, the least-squares variational 
formulation of (1.1) is given by (see (1.6)) 


mm F(ip), 

ipev 


with 


F WO : = J J \Cip(r,Q) - q(r , ! 


dVtdr. 


n s 1 


( 2 . 2 ) 


In order for ^ G F to be a minimizer of the functional F in (2.2), a necessary condition 
is that the first variation of F must vanish at tp for all admissible v EV, which results 
in the following problem: find ip eV such that 

a(ip , v) := (Zip , £u) = (q,£v^ Vu € V. (2.3) 

For the least-squares finite-element discretization of (2.2), the Hilbert space V is 
replaced by a finite dimensional subspace V h C V. This leads to the discrete problem: 

1 We allow here complex valued functions, since we use in Section 4 the expansion of v into 
spherical harmonics. 



find ifth E V h such that 


a(gp h ,v h ) = (q,Cv h ) Vv h eV h . (2.4) 

By an asymptotic analysis it was shown in [19] and [25] for slab geometry and V h 
formed by piecewise linear basis functions in space and a finite number of Legendre 
polynomials as basis functions in angle that this direct least-squares approach is not 
accurate in diffusive regimes. This can also be explained by the following heuristic 
argument. Because of the diffusion expansion (1.7) the important component of the 
solution i[> in diffusive regimes is the part that is independent of direction angle 0, 
which is given by Pip. On the other hand, the component (7 — P) ip of the solution is 
irrelevant in diffusive regimes. By Cea’s Lemma [7], the solution of the least-squares 
discretization can be viewed as the best approximation to the exact solu tion in the 
discrete space V h with respect to the semi- norm yja{-, •) := \J < £•, £• >. However, 
the different terms in the operator £, as defined in (1.6), are unbalanced (there are 
0(1), 0(1) and 0(e) terms), so that different components of the approximation error 
are weighted differently in •). The leading term of £ is 1(7 - P), which means 
that the part of the error that is dependent on angle is weighted in this norm very 
strongly in diffusive regimes (very small e), even though this part is irrelevant. On the 
other hand, the part of the error that is independent of angle, whic h is the important 
part in diffusive regimes, is hardly measured in the semi-norm ya(-, •), since it is 
weighted by e. 

The idea is to scale equation (1.6), thus changing the weighting in the norm used 
in the least-squares discretization, which, in turn, alters the choice of the element 
of the discrtete space as an approximation to the exact solution. Let us define the 
following scaling transformation and its inverse: 

S:=P + e(I-P), S~ l = P+^{I-P). (2.5) 

Clearly, applying the scaling transformation S from the left to the transport equation 
prior to the least-squares discretization will increase the weight of the important error 
component and decrease the weight for the irrelevant component. After applying the 
scaling transformation S from the left and dividing by e, equation (1.6) becomes 

Cip := -SCtp = -SQ. -Vip + -(I - P)ip + aPxp = q s , (2.6) 

£ £ £ 

with q s := Sq. 

Equation (2.6) can be balanced further by applying the scaling transformation S 
also from the right. Let the domain of operator £ in (2.6) be the Hilbert space V. 
Then, we define a space V by 

V := S^V, (2.7) 
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so that 

v = S~ 1 v and Sv = v (2.8) 

for all v E V and v E V. Scaling (2.6) from the right results in 

CSS~ l ip = CSi> = Q • V_4> + (/ - P)$ + aPip = q s , (2.9) 

where 

Q := -SQ.S = (1 - e) (PQ + QP) + sQI. 

— £ 

In the double-scaled operator CS in (2.9) the derivative of zeroth moment (P'Vjp), 
the derivative of the first moments (PQ, ■ V_ip) and all components of ip themselves 
are weighted equally. Moreover, it is easily seen that the double-scaled operator CS 
goes to a bounded nonsingular limit operator as e — > 0. 

In the least-squares context, the additional scaling from the right can be avoided 
because 


min (CSip — q s , CSip — q s ) 4=^ min (Cip — q s , Cip — q s ) , (2.10) 

fev x / ^ eV 

which will simplify the boundary conditions and so the computations. However, for 
the theory we exploit the nice form of the double-scaled operator CS and use this 
form of the transport operator as a tool. 

The least-squares variational formulation of the single-scaled equation (2.6) is 
given by the problem: find ip E V such that 

a(ip,v) := {Cip,Cv) = (q s ,Cv) Vw E V. (2-11) 

For the sake of completeness we remark that for slab geometry the form of the 
scaling transformation S, as defined in (2.5), remains the same, except that for P the 
definition (1.4) has to be used. In the case of slab geometry, therefore, equation (2.6) 
reduces to 

Cip := -SCip = -S/jl-^- + - (I — P)ip + aPip = q s . (2.12) 

£ £ OZ £ 


3. CONTINUITY AND V-ELLIPTICITY 


In this section we summarize without proof that the scaled least-squares bilinear 
form (2.11) is continuous , i.e., there exists a constant C c > 0 such that for every 
u,veV 

|a(u,u)| < C c |M|v |M|v, (3.1) 
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and V-elliptic , i.e., there exists a constant C e > 0 such that for all v G V: 


a(v,v) > C, IMIJ. 


(3.2) 


The Hilbert space V and its norm l|*|| y are specified below. It is crucial to prove these 
bounds with constants C e pnd C c that are independent of e and a, since this makes it 
possible to establish discretization error bounds that remain valid in diffusive regimes. 

We first consider the slab geometry case. Let D := [zi, z r \ x [—1,1] denote the 
computational domain and let (•,•) and ||-|| denote the standard inner product and 
the associated norm of L 2 (D), which are defined by 

tCy 1 

{u, v) := J J u- v dpdx and ||it|| := yj ( u , u ). 

Xi -1 


An appropriate norm for bounding the least-squares bilinear form a(-, •) is then given 
by the norm 


M v '■= 


1 dv 

2 

1/ 

-5/x— 

£ OZ 

+ 

-(/ — P)v 
£ 


+ \\Pv\f 


(3.3) 


The Hilbert space V can then be defined by 


V := G C°°(D ); v(zi , p) = 0 for p > 0; v(z r , p) = 0 for p < o|, (3.4) 

where the closure is taken with respect to the norm || • ||y. 

Prom the Cauchy-Schwarz inequality and discrete Holder inequality it is easy to 
obtain that for all u, v G V 


I a(u, u)| = | (Cu, Cv ) | < ||£u|| ||£u|| < 3 ||u|| v ||u|| y . (3.5) 

Thus, the bilinear form (2.11) is continuous with respect to the norm ||-|| y with 
C c = 3. 

The proof of the V-ellipticity is much harder and requires several technical lemmas. 
For a proof of the following theorem we refer the reader to [20] and [25]. 


Theorem 3.1 (F-ellipticity of a(-, •) ) Suppose that 0<a<l, 0<£< Let 
a(-,-) and ||'|| y be given as in (2.11) and (3.3) respectively. Then, there exists a 
constant C e > 0 such that for all v G V, 

a(v,v) > C e |M| y, . (3.6) 

where C e = 0.012, which is independent of e and a. □ 
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In the case of x-y-z- geometry we let D := 71 x 5 1 and generalize the definition of 
•|| v in (3.3) and the Hilbert space V in the following way: 



-SO • Vw 

£ 


+ 


-(I-P)v 

£ 


+ ||P»|| 2 . 


(3.7) 


V := ju G C°°{D ); v(r, 0) = 0 for r G &R, and 0 ■ n (r) < o|, (3.8) 

where the closure is now taken with respect to the norm ||-|| y in (3.7) and |j-|| denotes 
the norm in (2.1). The continuity (3.5) and the U-ellipticity (3.6) hold then with 
exactly the same constants C c and C e as in the slab geometry case. 

Together with the Lax-Milgram Lemma [7] the existence and the uniqueness of a 
solution for problem (2.11) and its discrete version (4.1), where V is replaced by a 
finite dimensional subspace V h C V, follows directly. In the next section we will use 
the continuity and the U-ellipticity of the bilinear form a(-, •) to prove discretization 
error bounds for a variety of discrete spaces V h . 


4. DISCRETIZATION ERROR BOUNDS 


In this section we establish bounds for the discretization error ip — ip^. Here, 
ip £ V denotes the solution of (2.11) and iph G V h C V denotes the solution of the 
corresponding discrete problem: find iph € V h such that 

a(ip h ,v h ) = {q s ,£vh) Vv h eV h . (4.1) 

The continuity and the U-ellipticity of a(-, •) lead directly to Cea’s Lemma [7]: 

a(i/> $ h ) < ai'ip -v h ,ip- v h ) \fv h e V h (4.2) 

or 

U ~ iph\\ v < \h5- min \\ip - v h \\ v . (4.3) 

y ^ e Vhtv 

Therefore, bounding \\ip — iph\\ v is reduced to the problem of bounding min \\ip — Vh\\ v , 

Vh£V h 

which is a problem of approximation theory and depends on the space V h . Here we 
consider discrete spaces V h that are formed by functions that can be expanded into 
the first N Legendre polynomials (spherical harmonics in the case of x-y-z-geometry) 
with respect to the direction angle fj, (0) and are piecewise polynomials of degree k in 
z (r) on a partition % of the slab [z*, z r ] (region 71). This class of finite dimensional 
subspaces corresponds to a discretization by a spectral method in angle and a finite- 
element discretization in space. The spectral discretization in angle with Legendre 
polynomials (spherical harmonics) is common for transport problems [16] and also 
called a P^-discretization. 
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Again, we consider the slab geometry case first. Let Th = {zi =: z 0 , z \, . . . , z m := z r } 
be a partition of the slab [zi,z r \ with maximum mesh size h and let IP k (Th) denote 
the space of piecewise polynomials of degree < k on the partition Th- Further, let 
Pi{p) denote the l - th Legendre polynomial. The normalized Legendre polynomials 
Pi(p) := \/ 2 1 + 1 Ppp) form then an orthonormal basis of L 2 ([— 1 , 1 ]). Thus, any 
ip G V has the following, expansion in angle, 

OO 

ip(z,p) = M z )pi(p), ( 4 - 4 ) 

1=0. 

where the Fourier coefficients <pi(z), which are called moments in transport theory, 
are given by 

1 } 

M z ) = 2 J ( 4 - 5 ) 


For the discretization we truncate the expansion in (4.4) and approximate the 
moments <pi{z) by piecewise polynomials on the partition Th- This results in the 
discrete space 


JV-X 


V h := \v h e C°m v h =Yl e JP r (Th) for l = 0, . . . , N - 1; 


1=0 


V h {z h p) = o for p > 0, V h (z r , p) = 0 for ji < 0 > . 


(4.6) 


Let | • |^o denote the standard semi norm of H u ([zi, z r ]) x L 2 ([— 1, 1]). Combining 
Cea’s Lemma, standard finite-element approximation bounds and using the fact that 
the Legendre Polynomials are eigenfunctions of the Sturm-Liouville operator [9, p.21], 
that is, 


PsPi(p) ■= 


d_ 

dp 



1(1 + l)pi(p), 


the following discretization error bound can be established (see [20] and [25]). 


Theorem 4.1 (Discretization Error bound for slab geometry) Suppose 0 < a < 1 
and 0 < e < Let ip G V fl [H k+l ([zi, z r ]) x H 2 ([— 1,1])) be the solution of 
(2.11) with q s G H k ([zi,z r ]) x H 2 ([— 1,1]). Further, let iph G V h be the solution 
of (4.1) with V h defined as in (4.6). Assume that if has the diffusion expansion 
ip(z, p) = <j>o(z) + ep R {z, p). Then, 


W'P - iphWv < 


1 (C x 
VC e \N 


|£s<7s|| + 


Ls 


dip 

Ph 


(4.7) 


+ 



{^3^ (l^oljfe+1,0 + t^lfc+l,o) + e h } e h , 
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with Ci,C 2 ,C 3 independent of a and e. In particular 


P p 


d(i> - iph.) ' 
dz 


< ee h , 




9(ip - iphY 
dz ■ 


< e h , 


\{I - P)(ip - 4>h)\\ < ee h , 

\\P(ip - i/) h )\\ < e h . □ 


For the definition of the boundary error ef we refer to [20]. However the following 
remark explains the source of this error. 


Remark 4.2 (Treatment of Boundary Conditions) In order to have V h C V, which 
is necessary for Cea’s Lemma, we incorporated the boundary conditions in the defi- 
nition (4.6) of the discrete space V h . However, in conjunction with a P N discretiza- 
tion in angle, these boundary conditions can only be fulfilled by a discrete function if 
4>ii z i) = <f>i(z r ) = 0 for £ = 0, 1, ... , N — 1. Therefore, the boundary conditions for the 
discrete problem are really given by v h {z h p) = v h (z r , p) = 0 for p e [-1,1]. The dif- 
ference to the real boundary conditions (v(zi, p) = 0 for p e [0, 1] and v(z r , p) — 0 for 
p € [—1, 0]) is measured in the error bound (4.7) by the term ef . In diffusive regimes, 
where the analytical solution is nearly independent of p, we have that v(zi,p ) « 0 
for p E [— 1, 0] and v(z r ,p) « 0 for p e [0, 1], so that ef will be small. However, for 
nondiffusive problems, it is not, in general, true that the inflow of particles is nearly 
equal to the outflow. In this case, ef will, in general, be large. 

One way to avoid this difficulty would be to use nonconforming finite element 
subspaces, that is, to require that functions in the discrete subspace obey Mark or 
Marshak boundary conditions [8]. Since then V h (f V, Strang’s Lemma [6] instead of 
Cea’s Lemma must be used in order to establish error bounds. 

Another, more natural, way to address this issue would be to incorporate the 
boundary conditions directly into the least-squares functional. For example, one 
could add to the bilinear form a(-, •) in (2.11) the boundary form 



and use a discrete space with functions that are free of any boundary constraint. 
Error bounds based on this approach will appear in a forthcoming paper. □ 

Remark 4.3 (Nondiffusive regimes) In order to get an error bound in (4.7) with 
a constant that is independent of parameter e it is assumed in Theorem 4.1 that 
the analytical solution has a diffusion expansion. For regimes, where the diffusion 
expansion is not valid, ^ is of moderate size, so that there is no need for an error 
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bound that is independent of e. In this case the second term on the right hand side 
of (4.7) simplifies to 

\c z h k ^1 + - ) M fc+li0 + ef | • 

However, we point out that this bound will blow up in diffusive regimes, where ^ 
becomes very large. □ 



Now, we generalize the error bounds for slab geometry to x-y-z geometry. Let Th 
be a triangulation of 1Z into thetrahedrons of maximum diameter h. Recall that the 
spherical harmonics [3, p. 571] are defined by 

Yr(e,v) : = (-ira, m pr(cos(0))e^, 


for l > 0 and — l <m<l , where 


Q, 


m 


N 


(2l + l)(l-m)\ 


P™ denotes the associated Legendre polynomials, and 9 denotes the polar angle with 
respect to the 2 -axis, while ip denotes the azimuthal- angle about the 2 -axis. The 
spherical harmonics form an orthonormal basis of L 2 (S l ). Therefore, any v £ L 2 (1Z x 
S 1 ) has an expansion of the form 


OO l /. 

>(r,B) = 5;E Wi)W With (Mr) = y v(r,Q)Yr(a)dCl. (4.8) 

1=0 m=—l gl 

Similar to the slab geometry case, we truncate this expansion for the discretization 
and approximate the moments by a function 6 IPk(Th), where IPk(Th) 
denotes the space of piecewise polynomials of degree < k on the triangulation Th- 
Thus, we define the following class of discrete spaces: 


V k ~{v„eV: »„(!:, a)=X : E (4.9) 

1=0 m=—l 


which correspond to a finite-element discretization in space and a Pn discretization 
[16] in angle. 

Let | • |/c+i,o denote the semi norm of H k+1 (7V) x L 2 (S <1 ). As in the slab geometry 
case, we combine Cea’s Lemma, standard finite-element approximation bounds and 
use the fact that the spherical harmonics are the eigenfunctions of the Laplacian 
operator An on the unit sphere to obtain the following discretization error bound 
(see [20] and [25]). 
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Theorem 4.4 (Discretization Error Bound for x-y-z geometry) Suppose 0 < a < 1 
and 0 < e < ^=. Let if G V fl ( H k+1 (7Z ) x H 2 ^ 1 )^ be the solution of (2.11) with 
q s G H k (lZ) x H 2 ^ 1 ). Further, let iph € V h be the solution of (4.1) with V h defined 
as in (4.9). Assume that if has the diffusion expansion (1.7). Then, we have: 


~ My < 


(ll A n?*ll + l A n^li,o) 

+ \f^{ C2hk (l^ 0 !fc+i,o + l^filfc+l,o) +e h} > 

V 


(4.10) 


with Ci and C 2 independent of e and a. □ 


5. MULTIGRID SOLVER 


The accuracy of the least-squares discretization in combination with the scaling 
transformation for diffusive transport problems has been demonstrated numerically 
in [19], [25] and in [20]. In this section we restrict the presentation of numerical 
results to a full multigrid solver for problems in slab geometry. We refer the reader, 
who is not familiar with multigrid methods to (Briggs [5]) for an introduction and to 
(Hackbusch [11]) and (McCormick [21] [22] [23]) for more advanced topics. 

The proper choice of the components, namely, the inter-grid transfer operators, 
coarse grid problems, and relaxation schemes, is essential for the efficiency of a multi- 
grid solver. The choice of the first two components is naturally given by the least- 
squares variational formulation. The sequence of discrete spaces L C U C ■ • • C 
Vi = V h determines the coarse grid problems since they are just the restriction of the 
variational problem to these discrete subspaces. The prolongation operator, which is 
a mapping from a coarse grid to the next finer grid in the grid sequence, is formed 
directly by composing the isomorphisms between the discrete spaces and their corre- 
sponding coordinate spaces with the injection mapping between I4_i and 14 (Bram- 
ble [4]), (McCormick [23]). The restriction operators, which are mappings from a 
finer grid to the next coarser grid, are just the adjoints of the prolongation opera- 
tors. Therefore, the only multigrid components that need to be chosen here are the 
sequence of discrete spaces and the relaxation. 

For the discrete subspaces, we use finite-element spaces with linear basis elements 
on increasingly finer partitions (halving the spatial cells) of the slab. 

As relaxation we employ a line moment relaxation that updates all moments 
simultaneously for a given spatial point. Our computational tests showed essentially 
no differences in the error reduction and smoothing properties of this line relaxation 


530 



Table 5.1: Multigrid convergence factors. 


V(l, l)-cycle 

Ot 

a = 1.0 

a = 0.5 

a = 0.25 

a = 0.1 

a = 0.0 

10° 

0.052 

0.086 

0.083 

0.118 

0.169 

10 1 

0.091 

0.092 

0.091 

0.117 

0.136 

10 2 

0.056 

0.056 

0.071 

0.106 

0.131 

10 3 

0.092 

0.093 

0.092 

0.105 

0.127 

10 4 

0.095 

0.094 

0.094 

0.106 

0.129 

10 5 

0.095 

0.094 

0.093 

0.107 

0.130 

10 6 

0.095 

0.092 

0.092 

0.107 

0.130 

10 7 

0.095 

0.092 

0.092 

0.107 

0.130 

10 8 

0.095 

0.092 

0.092 

0.107 

0.130 

10 9 

0.095 

0.094 

0.092 

0.107 

0.130 

10 10 

0.095 

0.094 

0.092 

0.106 

0.130 


scheme for various different orderings of the spatial points. To save computation, we 
use this line relaxation scheme in a red-black fashion, since then the residual after 
one relaxation sweep is zero at the black points and need not be computed for the 
restriction to the next coarser grid. This scheme is also more amenable to advanced 
computer architectures. 

The convergence rates for a V(l, l)-cycle of this multigrid algorithm, which uses 
one relaxation before and one relaxation after the coarse grid correction, are listed in 
Table 5.1. Even for values of a t = 1/e > 10 6 , we get V(l, l)-cycle convergence factors 
of order 0.1. These convergence factors are sufficient to get a solution with an error 
on the order of the discretization error by one single full-multigrid cycle. 


6. CONCLUSION 


The least-squares finite-element discretization with piecewise linear basis func- 
tions in space directly applied to the neutron transport equation does not yield a 
correct discrete solution in diffusive regimes. However, in combination with a scaling 
transformation applied to the transport operator prior to the discretization, the least- 
squares discretization is accurate for diffusive regimes and represents a systematic, 
general, solution approach. 

This approach, which converts the first order transport problem into a variational 
form with a symmetric bilinear form, is systematic because it includes the theory for 
the existence and uniqueness of the analytical as well as for the discrete solution, 
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bounds for the discretization error and guidance for the development of an efficient 
multigrid solver for the resulting discrete system. 

The key results are the -ellipticity and the continuity of the scaled least-squares 
bilinear form with constants independent of e and a. They make it possible to estab- 
lish error bounds that remain valid in diffusive regimes. Together with the freedom 
to choose a discrete space, this approach yields a general framework for finding dis- 
cretizations for the transport equation that are accurate in diffusive regimes. 

Because of its generality, this approach opens many possibilities for future work. 
The use of different discrete spaces can be explored. For example, one may consider 
finite-elements as basis functions for discretization of the angle dependence instead 
of Legendre polynomials or Spherical Harmonics. The boundary conditions could 
be incorporated directly into the least-squares functional, which would be a more 
appropriate treatment of the boundary conditions. Adaptive refinement could be 
combined with the multigrid solver in order to resolve boundary layers. Finally, it 
appears that it is possible to generalize the scaling transformation to anisotropic 
transport problems. 
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First-Order System Least-Squares for Second-Order Elliptic 
Problems with Discontinuous Coefficients 


Thomas A. Manteuffel Stephen F. McCormick Gerhard Starke* 


Abstract 

The first-order system least-squares methodology represents an alternative to stan- 
dard mixed finite element methods. Among its advantages is the fact that the finite 
element spaces approximating the pressure and flux variables are not restricted by the 
inf-sup condition and that the least-squares functional itself serves as an appropiate er- 
ror measure. This paper studies the first-order system least-squares approach for scalar 
second-order elliptic boundary value problems with discontinuous coefficients. Elliptic- 
ity of an appropriately scaled least-squares bilinear form is shown independently of the 
size of the jumps in the coefficients leading to adequate finite element approximation 
results. The occurrence of singularities at interface corners and cross-points is discussed, 
and a weighted least-squares functional is introduced to handle such cases. Numerical 
experiments are presented for two test problems to illustrate the performance of this 
approach. 


Introduction 

The purpose of this paper is to apply the first-order system least-squares approach 
developed in [4] and [5] to scalar second-order elliptic boundary value problems in two 
dimensions with discontinuous coefficients. Such problems arise in various application 
areas, including flow in heterogeneous porous media (see, e.g., [12]), neutron transport 
[1], and biophysics [7]. In many physical applications, one is interested not only in an 
accurate approximation of the physical quantity that satisfies the scalar equation, but 
also in certain of its derivatives. For example, fluid flow in a porous medium can be 
modelled by the equation 

— V • (aVp) = / (1) 

for the pressure p, where the scalar function a may have large jump discontinuities across 
interfaces. Of particular interest here is accurate approximation of the fluid velocity 

u = aVp , (2) 

a concern which led to the development of mixed finite element methods (see, e.g., [3, 
Chapter 10]). In mixed methods, both p and u are approximated by not necessarily 
identical finite elements and, roughly speaking, a Galerkin condition is imposed on the 
first-order system resulting from (1) and (2). 

An alternative to mixed finite elements is the first-order system least-squares ap- 
proach developed and analyzed, e.g., in [4], [5], [11], and [10]. This methodology re- 
places the Galerkin condition by the minimization of a least-squares functional associ- 
ated with a first-order system derived from (1) and (2). Augmenting the basic system 
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with the curl-condition V x (u/a) = 0 (see [5], [10]) leads to ellipticity with respect 
to the H 1 (fi) norm in the individual variables. Important practical advantages of this 
least-squares approach over standard mixed methods are: (i) the finite element spaces 
approximating the pressure and flux variables are not restricted by the inf-sup condition 
of Ladyzhenskaya-Babuska-Brezzi (cf. [3, Section 10.5]) and (ii) the least-squares func- 
tional serves as an appropriate error measure. Moreover, if the problem is sufficiently 
regular (e.g., if a £ C 1,1 (fi) and 11 has certain properties (cf. [5])), then (iii) optimal 
accuracy is guaranteed in each variable, including the velocities, in the H 1 norm and 
(iv) optimal computational complexity for the solution of the resulting discrete systems 
is achieved with standard multigrid methods (see [5]). 

For problems with discontinuous coefficients, which is our focus in this paper, the 
velocity components will, in general, not be in f/ 1 (fl). While the theory developed in [4] 
and [5] already allows for discontinuous coefficients, special care must be taken in order 
to prove ellipticity, in an appropriate norm, with constants independent of the size of 
the jumps. For this purpose, an appropriate scaling of the least-squares functional that 
depends on the size of a in different parts of the domain is introduced. This results 
in ellipticity, independently of the size of coefficient jumps, and consequently in finite 
element approximation results, with respect to a norm that is suitably scaled depending 
on the size of a. This scaling is presented in the following section. 

At interface corners and cross-points (i.e., where two smooth interface components 
intersect), the components of u will, in general, be unbounded, and singularities natu- 
rally arise (see, for example, Strang and Fix [14, Ch. 8]). The shape of these singularities 
is determined by the angle at an interface corner (or between two intersecting interfaces) 
and the jumps in the coefficients. We will show how the parameters describing these 
singularities can be computed from the coefficient jumps and corner angles. We are par- 
ticularly interested in the exponent associated with the singular function at a corner or 
cross-points since this indicates how much we have to unweight the least-squares func- 
tional in the neighborhood of such a point. The performance of this scaled least-squares 
approach will be studied using bilinear finite elements for the pressure and fluxes (based 
on the same grid) and a full multigrid algorithm for the solution of the resulting discrete 
system. Finally, computational experiments for two test problems are presented. 

Our restriction to two-dimensional problems is mainly for the purpose of exposition. 
However, some technical complications arise for three-dimensional problems. For ex- 
ample, two different types of singularities, associated with edges and with corners or 
cross-points, arise in three dimensions. We do not examine this in the present paper. 


The Least-Squares Functional 


Consider the following prototype problem on 

a bounded domain fi C 3R 2 : 


-V-(aVp) = /, 

in £1 , 


P = 0 , 

on T d , 

( 3 ) 

S3 

<3 

II 

O 

on T n , 



where n denotes the outward unit vector normal to the boundary, / £ L 2 (fi), and 
a(.Ti , X 2 ) is a scalar function that is uniformly positive and bounded in f2 but may have 
large jumps across interfaces. We assume that r.o 0, so that the Poincare-Friedrichs 
inequality 

|]p||o,n < 7l|Vp||o,n (4) 

holds and (3) has a unique solution in H 1 ( f2). Following [5], we rewrite (3) as a first- 
order system by introducing the flux variable u = aVp:. 


u — aVp = 0 , 

-V-u = /, 

p = 0 , 

n u = 0 , 


in , 
in , 
on Fd , 
on Tiv • 


( 5 ) 
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Since u /a = Vp with p £ ff 1 (f2), then we have (cf. [6, Theorem 2.9]) 

V x (u /a) = di{u 2 /a) — ^(wi/a) = 0 , in fi . 

Moreover, the homogeneous Dirichlet boundary condition on T# implies the tangential 
flux condition 

n x (u fa) = (niu 2 — n 2 u\)/a = 0 , on Tc . 

Adding these equations to first-order system (5) yields the augmented system 


u — aVp — 

0 , 

in fl , 

-V -u = 

/, 

in fl , 

V x (u/a) = 

0, 

in , 

P = 

0, 

on T d , 

n • u = 

0, 

on r N , 

n x (u/a) = 

0, 

on Td . 


In addition to L 2 (fl) and H l (Q.) with the respective norms || • Hc^n and || ■ H^n, we will 
need the spaces 

7I(div; fi) = {v £ L 2 (fi) 2 : V • v £ L 2 (f2)} , 
i?(curla;fl) = {v £ i 2 (fi) 2 : V x (v/a) G L 2 (Q)} 

and 

V = {q€H\Q) :q = 0onT D }, 

W = {v £ iJ(div; f2) fl fl(curla; fi) : n • v = 0 on Tjv , n x (v/a) =: 0 on T/j} . ' ’ 

Clearly, for the solution of (3), we have p £ V and u £ W, so it is appropriate to pose 
(6) on these spaces. 

As mentioned above, our main interest is in the solution of (3) when a(aq,a; 2 ) has 
large jumps. Following Bramble, Pasciak, Wang, and Xu [2], we assume that 

J 

O = 

i = 1 

with {£!,-} being mutually disjoint open polygonal regions; that the restriction of 
a(x i, x 2 ) to O; is in C 1 (fl ! ); .and that 

c\Wi < a(x i , x 2 ) < c 2 u>i for (aq , x 2 ) £ Qi 

with constants ci,c 2 of order one and arbitrary positive constants w». In other words, 
a(aq, x 2 ) is assumed to be of approximate size uq throughout for each i while large 
variations in {uq} over i are allowed. The bounds derived below will be independent of 
this variation in {uq}, but the constants in these bounds will depend on the variation 
within each fl t -, that is, on Ci and c 2 . 

An appropriate scaling of the equations in (6) leads to the least-squares functional 
G(u,p;f) = ||u/Va - VaVpIl^n-b ||V - u + /||^ n + ||a V x (u/a)||^ n (8) 

and associated bilinear form 

^(u, p; v, q) = (u /y/a - V«Vp, v/^ - ^/aVq ) 0>n /qs 

+(V -u, V-v) 0 ,n + (aV x (u/a),aVx (v/a)) 0| n ■ ' ’ 

Here, for the sake of notational simplicity, we agree that (•, )o,n is meant componentwise 
for vector functions. That is, if w = (uq, w 2 ) and z = (zi, z 2 ), then 

(w, z) 0 ,n = (uq , zi) 0 ,n + (™ 2 , 22 ) 0,0 • 


537 



The solution of (5) will also solve the minimization problem 


G{u,p-f)= min G(v,q-f) (10) 

(v,?)e WxK 

and, therefore, the variational problem 

^■(u,p;v,g)= -(/, V-v) 0 ,n for all (v, q) £ W x V . (11) 

Here we show that Tfv,q] v,q) is uniformly equivalent to the scaled norm defined for 
(v, q) £ W x V by 

lll(v, ?)lll = (IIV • v|(§ n + ||aV x (v/a)||^ n + ||v/^|| 2 n + ||\/aVg||^ n ) 1/2 . 

Theorem 1 Under the above assumptions, there exist constants 71 and 72, independent 
of the size of the jumps in {cj,-}, such that 

•^(u,p;u,p) > 7i|ll(u,p)||| 2 for all (u, p) £ W x V (12) 

and 

^(u,P;v,9) <72|||(u,p)||||||(v,?)||| for all(u,p) , (v,g) £ W x V . (13) 


Proof. The proof is similar to the proof of [4, Theorem 3.1] (see also [10, Theorems 
2.1 and 2.2]). We include it here because we must confirm that the constants 71 and 72 
are independent of the jumps in a. The main part of the proof consists in showing that 
the functionals 

T(u,p\v,q) = (u/Va - \/o.Vp, v/ \fa — \faVq)o,n + (V -u, V • v) 0 ,n 

and 


S(u,p;v,q) = (u/a/o, v/v / a)o,n + (\/aVp, VaVq) 0i n + (V • u, V • v) 0 ,n , 


satisfy 

ciS(u,p;u,p) < T(u,p\u,p) (14) 

and 

•^(u.p; v, q) < c 2 (.S(u,p;u,p)) 1/2 (<S(v, 9 ;v,5)) 1/2 (15) 

with constants c\ and c 2 that are independent of the jumps in a. 

For the proof of (14), we rewrite Poincare- Friedrichs inequality (4) as 

Ibllo.n < TllVaVpI^n . (16) 

Note that 7, and consequently the quantity 71 in (12), depends on min x6 n a(x) > 0. It 
does not introduce, however, any dependence of (12) and (13) on the size of the jumps 
in a. Since on dQ. we either have p=0orn-u = 0, then integration by parts confirms 
that 

(u, Vp)o,n + (V • u,p) 0 ,n = 0 . 

For any r > 0, which we specify later, we have 

^■(u,p;u,p) 

= (u/a/u, u/A/a) 0 ,n + (a / aVp, ^/aVp)o,n ~ 2(u, Vp) 0 ,n + (V • u, V • u) 0 ,n 
+2r(V • u,p) 0 ,n+ 2r(u, Vp) 0 ,n + r 2 (p,p) 0 ,n - T 2 (p,p)o,n 
= (u/^/a + (r - l)yfdVp,u/y/a + (t - l)y/aVp) 0i fi 
+(V • u + rp, V • u + rp) 0 ,n - r 2 (p,p) 0 ,n + (2r - r 2 )(y/aVp, y/aVp ) 0 ,n 

> (2r - r 2 )(v^Vp, \/aVp ) 0| n - r 2 (p, p) 0 ,n 

> (2 r - (1 + 7)'r 2 )|| % /aVp|| 2 n . 



Choosing r = 1/(1 + 7) leads to 

/(u,p;u,p) > r||V^Vp||g n . 

We then also have 

ll u /\/a|lo,n < 2 (II U /V« - + ||\AVp||^ n ) < 2(1 + l/r)^(u,p; u,p) 

and, clearly, 

l|V-u||^ n <:F(u,p;u,p), 

which completes the proof of (14). 

Upper bound (15) follows from 

^(u,p;v,?) < 2(i'(u,p;u,p)) 1/2 (^'(v,g;v,g)) 1 / 2 

and 

^■(u,p;u,p) = Hu/v^- \/aVp|| 2 >n + ||V -uH^n 

<2(||u/v^||o, n + ll\/aVp|| 2 in +||V-u|| 2 in ) = 5(u,p;u,p). *■ 

The proof of Theorem 1 is completed by adding the term ||aV x (u/a)||o,n to both 
sides of the inequalities (14) and (17). g 

Theorem 1 states that ellipticity and continuity of the least-squares bilinear form 
in terms of the norm |||(-, ■) 1 1 1 is independent of the jumps in a. Note, however, 
that the ellipticity constant 71 in (12) depends on the size of a, in particular, on the 
positive constant min xe n u(x) through the Poincare-Friedrichs inequality (16). 

The scaling of the norm |||(-, -)||| has the following physical interpretation. In areas 
where a is relatively small, Vp is allowed to be relatively large, and one has to expect a 
less accurate approximation there compared to areas where a is large and Vp is therefore 
small. In contrast, the velocity u = aVp can be expected to be more accurate in areas 
where a is small and less accurate, in general, where a is large. Ellipticity with constants 
that are independent of the jumps in a asserts that the scaling in T{-, •; •, •) correctly 
reflects these attributes. 

Singularities at Interface Corners and Cross-Points 

This section is concerned with the behavior ofp and u at or near the interface curve. 
Most of what we present in this section is well-known; we refer to Strang and Fix [14, 
Chapter 8] for further details. 

Recall from the previous section that the solution of (6) satisfies u 6 H{ div; fl) fl 
77 (curl a; fl). This implies that, at a point on a smooth segment of the interface 
curve, the normal component n • u and the tangential component n x (u /a) must be 
continuous. Assume that fl = fl U fl with constant diffusion coefficients a + and a ~ , 
respectively, and let u + = (u^uj) and u _ = (uj~,uj) denote the solution restricted 
to the respective subdomains (see Figure 1). Then ui and «2 must satisfy the jump 
conditions 

U~^~ U 'LL 

niut + n 2 ut = niu7 + n 2 u J and n 2 — [r — n 1-^- = n 2 — ni— . (18) 

<2+ a + a a 

For example, consider the situation shown in Figure 1 (which we will encounter again as 
Example 2 in the final section of this paper). Across the vertical part of the interface, 
ui = n • u will be continuous while u 2 = n x u has a jump factor of a + /a~ . Similarly, 
across the horizontal part of the interface, u\ = — n x u has a jump factor of a + /a^ 
while u 2 = n-u is continuous. At the interface corner, both of these conditions must be 
satisfied, i.e. , u\ and u 2 must jump by a factor a + ja~ and be continuous at the same 
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Figure 1: Interface with corner 


time. Obviously, there are only two ways for this to happen: either u = 0 or u = oo at 
the interface corner. In general, the latter case is encountered at interface corners — the 
behavior of u is singular there. 

Without loss of generality, assume that the singularity occurs at the origin, and 
consider the polar coordinate representation 


( \ ~ ( r cos $ A 

\ xn J ~ \ r sin 9 J 


The solution of (3) then admits the representation 


f 7’“(A+ cos a# T A+ sinafl) + p+(r, 9) , in fi+ , 
\ r a (X~ cos a# + Aj sina0) + p~(r, 9) , in , 


where p+ £ H 2 (n+),p £ H 2 {Q~ ) (cf. [14, Section 8.1]), a £ (1/2,1), and Xf,X f are 
constants. Using 



cosfl^-sinfli^r \ 
sin0£ + cos0± fg J 


(19) 


leads to 


ui(r, 9) 


aa+r 01 1 (X+ cos(a - 1)9 + X+ sin(a - 1)9) + uf(r,9) , in , 
aa _ r a-1 (A“ cos(a — 1)9 + Xj sin(a — 1)0) + uj" (r, 9) , in Q~ , ^ ' 


and 


« 2 {r, 9) 


aa+r a X (-A+ sin(a - 1)9 + A+ cos(a - 1)0) + u+(r, 9) , in . 

aa-r a - 1 (-X~ sin(a - 1)9 + Xj cos(a - 1)9) + v,2 (r,9) , in ft - , ^ ' 


with uf ,v ,2 G f? 1 (f2 + ) and £ H 1 (Q,~). The parameters a,A+,A+,A~, and Xj 

are computed such that conditions (18) are fulfilled. Setting p = a + /a~ leads to the 
matrix equation 


—p sin a^ir 

p cos a^ir 

— sin a~ 

— cos a |r 


A+ 


" 0 ' 

— COS Ql“7r 

— sin a |tt 

cos a 1 - 

— sin a-| 


K 


0 

— cos ai r 

— sin air 

cos air 

sin air 


K 


0 

p sin air 

— p cos air 

— sin air 

cos air 


L A," J 


0 
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For this homogeneous system of linear equations to have a nontrivial solution, its de- 
terminant must vanish, which leads to 

+ — )(cos 7ra — cos 2wa) + 2 — cos 7ra — cos 27m = 0 . (22) 

/d [1/ 

The exponent a that determines the degree of the singularity apparently depends on the 
size of the jump p. It can be shown that (22) always has a unique solution a G (1/2, 1). 
For fi — ► 1, i.e. , as the jump disappears, we have a — ► 1, i.e., the singularity disappears 
as well. For /i— >• 0 or p — >■ oo, a tends to 2/3, which is exactly the value obtained for a 
reentrant corner with exterior angle 7r/2.' It is straightforward to extend the procedure 
outlined above to any number of adjoining subdomains and any size of angles (cf. [8]). 
We therefore have a computational technique to compute the shape of the singularity 
at interface comers and cross-points where two interfaces intersect. This technique will 
be fundamental for the finite element approach described in the next section. 


Finite Element Approximation 


The minimum of G(u, p; /) is approximated using a Rayleigh- Ritz finite element 
method. Let T h be a triangulation of ft, which we assume to be quasi-uniform (cf. 
[3, Chapter 4]), and let W ft and V h be appropriate finite-dimensional spaces. The 
interface is required to be the union of edges of the triangulation. If the interface 
is cutting through elements of the triangulation, then special techniques have to be 
considered in order to average the parameters properly, which complicates the whole 
approach. We do not address this task or the problems associated with it here, but 
instead assume that the interfaces are restricted to edges of the triangulation. For the 
sake of exposition, we also assume that each segment of the interface curves is parallel to 
one of the coordinate axes. It is easy to see that the following development of the finite 
element approach can be generalized to isoparametric elements, where the interface 
curves are logically aligned with coordinate axes. 

It is desirable, in general, to use conforming finite elements, where the finite- 
dimensional spaces satisfy W ft C W and V h C V. Along straight segments of the 
interface curve, this can be accomplished by enforcing condition (18) on the finite ele- 
ment basis functions. Using bilinear finite elements on rectangles, for example, a basis 
function for u\ at a node on a horizontal interface segment is continuous in the in- 
direction and has a jump of si2e a + /a~ in the ^-direction. Such a basis function for «i 
at a node on a vertical interface segment is continuous (in both coordinate directions). 
Under the assumption that all the interface curves are straight lines which do not in- 
tersect each other (we will address the case of interface corners oi cross-points later), 
we can therefore construct piecewise bilinear finite element spaces: 

V h = {q E V : q\r bilinear on T for all T G T h } 

W h = {v G H( div, ft) n if (curl a, ft) : Vi |t bilinear on T for all T G T h } . 


The finite element approximation. (u h ,p h ) G W ft x V h is then defined as the solution 
of the minimization problem 


G(n h ,p h -J) 


min 

(v\? h )e W h xV h 


G(v\q h -J). 


(23) 


One of the main practical advantages of the least-squares finite element approach over 
other variational formulations consists in the fact that the minimum of the functional 
constitutes an a posteriori error measure. This follows from the general relation between 
the least-squares functional and corresponding bilinear form. The main point here is 
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the fact that the least-squares functional is zero at the solution (u,p), which leads to 


G(u h ,p h -f) 

= G(u h ,p h -J) - G(u,p;f) 

= T(u h ,p h ;u h ,p h ) + 2 (/, V • u ft ) 0i n - F h (u ,p; u, p) - 2 (/, V • u) 0 ,n 
= P h (u — u h ,p — p h ;u — u h , p — p h ) . 

Under the above assumptions, we get the following convergence result for the finite 
element approximation. 

Theorem 2 Assume that for (u,p), the solution of (10), we have (u,p)|ni G 
(H 1+l5 (f2j)) 3 for some 8 E (0,1] and for i = 1,...,/. Let (u h ,p h ) G W ft x V h be 
the solution of ( 23). Then 


j 

|||(u,p) - (u\p h )||| < Ch* (||u|| 1+ ,, n . + ||v^Plli + *,n.) (24) 

2 = 1 


where the constant C is independent of h and of the size of the jumps in 

Proof. From Theorem 1 and Cea’s Lemma (see, for example, [3, Theorem 2.8.1]), 
we obtain 


ll(u.P)' 


(u A ,/) 


<2! 


mm 

Tl (v k ,q h )e'W h xV h 


(u,p)-(v\g* 


Moreover, for (v, q) E W x V , we have 

lll( v . ?)lll 2 = ||V • v||g 0 + ||aV x (v/a)||g n + ||v/V^Ho,n + HVaVlzIlo.n 

J 

= (ll V ' v llo,n, + llaV x (v/a)||g >n . + ||v/V^||^ rii + ||VaVg||g in .) 

2 = 1 

< Cl J2 (H V ' v llo,n, + l|V x v||g in . + ||v/v^||o,n, + ll\/“Vg||g in .) . 

2 = 1 

Since by assumption G H 1 ^) and, similarly, v ft | n , G H 1 ^) for each v h E W h , 
then for i = 1 , . . . , J we have 

II V ■ (u - v A )||g nt + || V x (u - v A )||§ >n . < c 2 |u - v h \l t n, ■ 


This leads to 

J 

lll(u,p)-(v'V)||| < c 3 ]T (|u - + ||(u - v^/V^lkn* + IIv^Cp ~ f^lli.nO • 

i = 1 

Standard interpolation properties of piecewise bilinear functions (see, for example, [3, 
Theorems 12.3.3 and 12.3.12]) lead to 

||u - v h ||i i n, < c 4 /i' 5 ||u||i + {,n, 

lb - ? A ||i, n, < C5/i*||p||i+«,n i 


which completes the proof, g 

If the interface curve is not a straight line, or, more generally, not sufficiently smooth, 
then the finite element approximation becomes excessively more complicated. In the 
preceding section we saw that, for the solution (u,p) of (10), u has the singular behavior 
shown in (20) and (21). It is easy to see that this implies u|n ; $ (Lf 1 (Q,)) 2 for all 
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subregions f2j adjacent to the interface corner, and therefore the standard finite element 
approximation results do not apply. 

Moreover, in order to have u A £ H( div, f2) n if (curl a, f2) in the neighborhood of 
an interface corner, it is necessary and sufficient to require to have the form of 
(20) and (21). In other words, in order to have conforming finite elements, we must 
include a singular -basis function at each interface corner (or cross-point): The tools 
developed in the previous section allow us, in principle, to compute the exact shape of 
such a singularity. Multiplied by a standard piecewise bilinear function, such a singular 
function could then serve as a basis function at that point. A procedure of this type 
is described in [14, Section 8.2] along with special techniques to solve the resulting 
discrete system. However, this approach requires special stencils for these singular 
points, which complicates the overall finite element approach. Instead, we consider an 
alternative nonconforming finite element method, based on simple basis functions like 
bilinears on rectangles. 

We construct ~W h observing the fact that, for the right-hand side in (11) to be 
defined, we must have W h C H(div,Cl). This implies that, for u ft £ W h , n ■ u h 
must be continuous across all interfaces. Now consider the bilinear finite element basis 
function associated with the interface corner in Figure 1. For u/‘ £ W h C H( div, 0), 
we must require that «i is continuous in the xi-direction across the horizontal portion 
of the interface; that U 2 is continuous in the ^-direction across the vertical portion 
of the interface; and that both ui,U 2 are continuous elsewhere. From (18) we see 
that u £ ff(curla,fl) requires «2 to have a jump across the vertical portion of the 
interface, while ui must have a jump across the horizontal portion. This causes a 
conflict at the corner. The finite-dimensional space W ft will, therefore, not be contained 
in Ff(curla, 0), in general, and W h x V h (£_ W x V. In particular, the bilinear form 
T{-, •; ■, •) is not defined on W ft x V h . For u, v £ W + W ft and p, q £ V + V h , we define 
a modified least-squares bilinear form by 

F h (u,p-,v,q) = (u/y/a- y/aVp,v/yfi- yfe,Vq) 0i n 

+(V-u,V-v) 0 , n + EL(Vxu,Vx v) 0 ,n, • 1 j 

On W x V, this bilinear form coincides with T(- , •;•,•). The least-squares functional 
corresponding to T h (•,•;•,•) is 

j 

G h (u,p-J) = ||u/Va-\/aVp||g in +||V-u+/||g n -|-^||V x u||g >0i . (26) 

i=i 


by 


Let (u,p) £ W x V be the solution of (10), and let (u h ,p h ) £ W h x V h be defined 


G h (u h ,p h -f)= min G h (v h ,q h ; f) 
^ ' (v\gfc)6 W h xV h V ' 


(27) 


Recall that, at an interface corner, u has a singularity of the form given in (20) and 
(21). This implies that we cannot expect to approximate u to the same accuracy by 
standard finite elements near a singularity as elsewhere in 0. Moreover, since our finite 
element subspace W ft x V h is not contained in the space W x V in which we have shown 
ellipticity, the relatively large error near a singularity will deterioriate the finite element 
approximation in the entire region. This phenomenon is reflected by the fact that, 
in the presence of singularities, G h (n h ,p h \ f) does not decrease as h is made smaller. 
We will observe this behavior later in our computational experiments. It is therefore 
necessary to introduce a weight function which decreases near the singular point. The 
proper choice of weighting is motivated by the form of the singularity. 

In particular, (19), (20), and (21) imply Vu ~ r“~ 2 in the neighborhood of the 
singularity. If T* denotes an element of the triangulation T h such that the interface 
corner appears as one of its vertices, then 


r 2 ~ a V(uj - u) 


0 T h — 0(h 2 ) . 
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If the right-hand side / and the restriction of a to fi,- are sufficiently smooth, then we 
know that u 6 (Hf oc (Q,)) 2 , i.e.. u £ (H 2 (Q)) 2 for any compact Q C 0;. This implies 
that v h £ W A exists such that 


\\V-(n-v h )\\ afl = 0{lr). 

The other terms in (26) can be treated in a similar way, which motivates the definition 
of the weighted least-squares functional 

G h w (u,p-f) = Hu/v^- \M>||s h l _ a n 

+11 V ■ u + /|lo,/i, 2 -a,n + T.U II V x u lln,/ ! . 2 -Q.n 1 ( ' 


and corresponding bilinear form 

.p:v,(?) = (u/x/a — y/aYp),v/^/a - v/aVg))o./,,i-o.n 
+ {Y U, V v) 0 .A. 2 -a.n + Ef=l( V X u). V X v) 0 ./ 1 . 2 -a,n, • 


The inner product (•. -)o,h.j.n is defined as 

(v, w)o,ft,j.n = («■ V,U' w) 0 ,n 

with the weight function w h,i3 constructed in the following way: Consider a sequence of 
triangulations {T h ‘ .1 = 0 with H = ha > h\ >■■■ > Iil — h. Let Of' denote 
the union of of all elements T hl £ T h ‘ with the singular point as one of their vertices. 
The weight function is defined as 

f h? for x £ Of , 

u> M (x) = < /if for x £ tts'- 1 \fij> , / = 1 L , (30) 

\ 1 for x £ 0\f2f° . 


Let (u f,,pf,) £ W h x V h be the solution of 
G h w (u h w ,p h w ,f) = 


min W,(v\g ft ;/) . 
(v‘,{M£W‘xr‘ 


(31) 


In the final section of this paper we will demonstrate, by means of numerical results, that 
the weighted functional G (uf, , p ; /) actually decreases regularly as the triangulation 
is refined. Note, however, that this does not mean that the error u — is small 
throughout the region 0. In particular, the pointwise accuracy usually deteriorates 
near singularities. This suggests that, the weighted functional should be combined with 
local refinement techniques to guarantee satisfactory resolution in the entire region. 
Multilevel refinement techniques are especially effective in this context. 


Multilevel Algorithms 

Consider the sequence of triangulations {T hl , / = 0, . . . , L) introduced earlier. As- 
sociated with each triangulation {T hl } is the finite element space W ft| x V h> , which we 
may also denote by W; x Vj. This leads to a nested sequence of spaces 

W 0 x Co C Wi x Vi C • • • C W L x V L = x V h . 

On each level l, 0 < l < L, an operator T\ : W; x V\ — ► W; x V; is defined by 
(<:F/(u,.p); (v, q))) = ^(u,p;v,g) for all (v,g) £ W| x h , 
where the inner product ((■; •)) is given by 

((u, p ); (v, q))) = (u, v) 0 ,n + (Va P, Va q) 0 ,n ■ 
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In terms of the operator Ti, the discrete problem (23) can be written as 

F,(n i,Pi) = F, (32) 

where the right-hand side is defined by (( Fi , (v, q ))) = — (/, V-v) 0 ,n for all (v, q) £ W; x 
V], For the solution of (32), it is natural to use an iterative method since this requires 
only a computational procedure for the action of the operator T\ for l = 0 
The cost for one call of such a procedure is proportional to the number of unknowns 
N = 0(h~ 2 ). 

The conjugate gradient method (cf. [13, Section 8.7]) computes its iterates 
(u' l n \p' l n ^) £ W; x Vi in the Krylov subspace 

K n (Fi,T{) = spanjF 1 ;, F\Fi , .... T™~ l F\) 

according to the minimization property 

G(u[ n) ,p| n) ;/) = min G(v,,®j ;/) . 

Since the condition number of T\ is proportional to 0(hf 2 ) (cf. [5, Theorem 3.2]), the 
number of conjugate gradient iterations required to achieve a certain accuracy grows like 
0(h ; _1 ) (cf. [13, Section 8.7]). The overall computational complexity to solve a discrete 
problem on T h ' using the conjugate gradient method therefore grows like 0(hf 3 ). 

Optimal computational complexity, 0(hj 2 ), can be achieved, under certain assump- 
tions on T({-, •);(•, •)), by a full multigrid algorithm. The basic ingredients for multilevel 
methods are the projection operators Vi,Qi : W h x V h -* W; x Vj which are given by 

F(Vi(u,p)-(v,q)) = JF((u ,p);(v, 5 )) for all (v,g) £ W, x V\ 


and 

((Qi( u .p);( v .9))) = (((u.p);(v,?))) for all (v,g) £ W, x Vj 

and smoothing operators Hi : W; x Vj — * W/ x Vi representing iterations on level /. 
With these tools, standard multilevel algorithms can be constructed (see [5, Section 4] 
for further details). A detailed study of the convergence properties of multilevel methods 
for first-order system least-squares applied to problems with discontinuous coefficients 
will be given in [9] . 


Computational Experiments 

In our examples, we consider (3) on the unit square fi = {(x\,X 2 ) E R 2 : 0 < 
xi,X 2 < 1), with / = 1 and T# = <9fi. We show the results of two sets of experiments, 
one with a smooth interface curve and the other with an interface corner causing a 
singularity in u. 

Example 1. In this example, the interface curve is a straight line, so no singularity 
occurs. We consider 

u(*r i , x 2 

with different choices for the values for a + and a~ . The solution shown in Figure 3 was 
obtained for a + = 10 and a~ = 0.1. 

The computational results shown in Table 1 indicate that the approximation of the 
solution improves nicely as the triangulation is refined, independently of the size of the 
jumps. The reduction factor displayed in parentheses is the ratio of the minimum values 
on the current and next coarser level. Note that they do not quite reach 0.25, which is 
due to the lack of regularity at the corners of the subdomains. In fact, due to the corners 


, _ J a + , 0 < X 2 < 0.5 . 
' — \ a~ , 0.5 < x 2 < 1 


(33) 
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Table 1: Example 1: Minimum value (reduction factor) of the functional G h 


a + / a 


1 



10 



10 2 



10 4 


h 

1/8 

2.42 

• 10~ 2 


3.50 

• io ~ 2 


4.13 

■ 10~ 2 


7.81 

• io - 2 


1/16 

7.18 • 

10~ 3 

(0.30) 

1.07- 

10~ 2 

(0.31) 

1.26 • 

io - 2 

(0.31) 

2.30- 

10“ 2 

(0.29) 

1/32 

2.08- 

1(T 3 

(0.29) 

3.14- 

10~ 3 

(0.29) 

3.71 • 

10~ 3 

(0.29) 

6.41 • 

10~ 3 

(0.28) 

1/64 

5.92- 

1(T 4 

(0.28) 

9.05 ■ 

10~ 4 

(0.29) 

1.07- 

10“ 3 

(0.29) 

1.75- 

10" 3 

(0.27) 


with interior angle n/2, we have neither u £ ( H 2 (Q + )) 2 nor u £ (H 2 (Q~)) 2 . Conse- 
quently, the finite element approximation deteriorates near these corners. In contrast to 
the situation at singularities, however, this behavior does not contaminate the solution 
elsewhere since the basis functions corresponding to these points are conforming. 

Example 2. This example shows results for a problem with a singularity in u. We 
choose 


f a + , 0 < xi,X 2 < 0.5 , 

( a ~ , elsewhere 


(34) 


(see Figure 1) with different choices for the values for a+ and a~ (again with a + = 10 
and a - =0.1 for the solution shown in Figure 4). 

The exponents for this example with the three values for the coefficient jumps used 
in Table 2 are given by a = 0.7317, 0.6739, and 0.6667, respectively. Note that the last 
number is very close to the value a = 2/3 that one gets for a reentrant corner with 
interior angle 3/27T. Using the weighting described earlier with H = 1/8 leads to the 
results listed in Table 2. The modified least-squares functional is again reduced nicely 
and regularly as the triangulation is refined. Note that using the weighted functional 
means that the pointwise approximation deteriorates close to the singular point, where 
local refinement can be used if a better pointwise resolution is needed. 


Table 2: Example 2: Minimum value (reduction factor) of the weighted functional G 


a + /a 


1 


10 

10 2 

10 4 

h 

1/8 

2.42 

• 10“ 2 

3.74 

• 10“ 2 

5.17 - 10~ 2 

1.20 • 10" 1 

1/16 

7.18 ■ 

IO -3 (0.30) 

1.16 ■ 

10“ 2 (0.31) 

1.58 • 10~ 2 (0.31) 

3.53 • 10- 2 (0.29) 

1/32 

2.08- 

IO" 3 (0.29) 

3.43- 

10“ 3 (0.30) 

4.66 • 10~ 3 (0.29) 

9.84 • 10~ 3 (0.28) 

1/64 

5.92 ■ 

10“ 4 (0.28) 

9.95 • 

10~ 4 (0.29) 

1.34- 10" 3 (0.29) 

2.68 • 10“ 3 (0.27) 


Table 3: 

Example 2: 

Minimum value of the functional G h 

a + /a~ 

1 

10 

10 2 

10 4 

h 

1/8 

1/16 

1/32 

1/64 

2.42 • 10~ 2 
7.18 • IO" 3 
2.08 ■ 10“ 3 
5.92 -10- 4 

4.36 ■ 10~ 2 
2.39 ■ IO" 2 
2.07 • 10“ 2 
2.22- 10" 2 

7.50 ■ IO” 2 
5.49 ■ 10- 2 
5.35 • 10“ 2 
5.66 • 10- 2 

1.62 • 10- 1 
9.89 • IO" 2 
8.86 • 10- 2 
9.33 • 10“ 2 


In order to illustrate the necessity of modifying the functional in the neighborhood 
of a singular point, we also computed the results for the unmodified functional G h 
instead of G^. The numbers in Table 3 show that this functional is not satisfactorily 
reduced in the course of refining the triangulation. Our numerical tests have shown 
that minimizing the unmodified functional leads to poor finite element approximations. 
Figure 2 shows the error with respect to the exact solution for p for the weighted 
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functional and for the unmodified functional. Obviously, for the unmodified functional, 
the resulting error between the discrete and exact solution is relatively large in the entire 
domain. This behavior seems to indicate that using the unmodified functional has the 
effect of trying too hard to satisfy the first-order system (6) close to the singularity, 
where it is impossible to get a good approximation with bilinear finite elements. For 
the weighted functional, however, the error is smaller and mainly occurs in a rather 
small neighborhood of the singular point. 


0.01 5 




Figure 2: Example 2: Error in the pressure p for the weighted functional (top) and the 
unmodified functional G h (bottom) 
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Figure 4: Example 2: Pressure p (top) and flux components u\ and U 2 
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ON DGS RELAXATION: THE STOKES PROBLEM* 


A. J. Meir 

Department of Mathematics 
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ABSTRACT 

Multigrid methods have proven to be efficient methods for solving partial differential 
equations (especially those of elliptic type). There is also growing experience with 
multigrid solvers for fluids problems, e.g., the Stokes and Navier-Stokes equations (using 
both finite element and finite difference discretizations). 

It is also well known that at the heart of any multigrid method is the smoother. In 
this work we look at a smoother introduced by Brandt and Dinar (DGS relaxation), 
and we examine some of its properties and consider some possible modifications to it. 
It is well known that multigrid performance using DGS relaxation is sensitive to the 
treatment of boundaries; this issue is addressed. 


INTRODUCTION 


Multigrid methods have proven to be efficient methods for solving partial differential 
equations (especially those of elliptic type). There is also growing experience with 
multigrid solvers for fluids problems, e.g., the Stokes and Navier-Stokes equations (using 
both finite element and finite difference discretizations. (See, e.g., [1]— [13] and the 
references therein.) 

It is also well known that at the heart of any multigrid method is the smoother. In 
this work we look at a smoother (DGS relaxation; distributed Gauss-Seidel relaxation) 
introduced in [2] and [3], as it applies to the Stokes problem. We examine some of its 
properties and consider some possible modifications to it. 

We consider the well-known Stokes equations; these equations, which model flows 
with small velocities (creeping flows), may be viewed as a linear version of the Navier- 
Stokes equations (which describe the flow of an incompressible, viscous fluid). The 

*This work was supported in part by a contract from American Computing, Inc. 



following analysis extends to the (nonlinear) Navier-Stokes equations and is the subject 
of a forthcoming paper. 


The Stokes equations in D. are, where Q is a bounded domain in M 3 (we assume 
the domain is three-dimensional; obviously, the following results hold equally well for 
two-dimensional domains), 


4 

- Au + Vp = f 

(1) 

and 


V • u = 0 . 

(2) 

On dH (the boundary of Q). 


u an = g • 

(3) 


Here u and p are the velocity and pressure, respectively (the unknowns). Given are the 
body force f and the boundary condition g. 

There exists a large body of work which deals with the analysis and the development 
of various approximation methods of solutions for this system of equations. (See, e.g., 
[14]— [17] and the references cited therein.) Here we propose yet another such method 
which is based on a reformulation of the equations (suggested by DGS relaxation). 

Remark 1. It is well known (see [15] and [16]) that given f e (H 1 (Q) 3 )* and 
g G H l/2 {dVl) with f dn g-n ds — 0 the Stokes equations (1 )-(3) have a unique solution 
(u ,P) 6 H'(Q f x msi). 

Throughout the paper we assume that Q is a bounded, simply-connected domain in 
M 3 which is of class C 1,1 or is a convex polyhedron. (See [16] or [18].) The boundary of 
the domain is denoted dQ and n is the unit outward-pointing normal vector to 0. Here 
and in the sequel H S (Q) (s a positive integer) is the usual L 2 (fl)-based Sobolev space, 
JT 1 / 2 (9f2) is the trace space of H l (Q), and H~ l l 2 {dQ) is its dual. (See [18].) Also, 

Lg(f2) = jp G L 2 (Q) : J^pdx = oj 

(i.e., it is the subspace of L 2 -functions which have zero mean; see [16] and [17]). We 
also introduce the following subspaces of if _1//2 (<9f2) 3 and H 1 (O) 3 (see [19] and [16]): 

H- 1/2 {nf := {t G H- 1/2 (dnf : t • n = o} 

and 

i^(ft) 3 := {* G H 1 ^) 3 : * • n| 5 n = o} . 

On Hq(Q) 3 (the space of functions with zero trace on the boundary) and on H)(Q) 3 , 

(IV x ( ■ )llo + llv • ( • ) llo) 1/2 

is a norm equivalent to the /T-norm (due to the existence of a Poincare-type inequality 
for domains such as those discussed above; see, e.g., [16]). Here || • || s denotes the H s - 
norm (s = 0 for L 2 ). 
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The Stokes equations can be formally written as the system 


u' 


- -A 

V 

u' 


f 

p. 


.-v- 

0 . 

.P. 


.0. 


u|an = g. 


DGS relaxation may be viewed as Gauss-Seidel relaxation on a right preconditioned 
system or Gauss-Seidel relaxation on .an equation with transformed variables. The 
change of variables (up to a sign change) as described in [2] and [3] (also in [13]) is 


given as 


LM 


u' 


’ -A 

0 ' 

u 


f 

P. 


-V- 

-A. 

.P. 


.0. 


It is easily seen that the (so called) distribution matrix M (the right preconditioner) is 


M = 


I V 
.0 A. ' 


Formally, M” 1 , the inverse change of variables, is given by 



So the change of variables is given by 


u' 


I 

V 


u‘ 


u 


I 

-VA- 1 ’ 


u' 

-P. 

— 

.0 

A. 


. P . 

or 

-P. 


.0 

A" 1 


. P . 


Thus we end up with the equations 

— Au = f 


and 

— Ap = V • u . 

An obvious obstacle in this approach is the lack of boundary conditions on u = u - 
VA -1 p = u - Vp and onp = A -1 p- Obviously we cannot specify Vp on the boundary 
(one would like to do that since u| 9 n = g is given), since this would result in an 
overdetermined system for p. Note that even if a boundary condition for p were derived 
and we were to derive a boundary condition for u, this boundary condition for u would 
involve p (namely, Vp). Thus we would end up with a system of equations that are 
coupled through the boundary conditions. (See [4].) 

Thus it is proposed (in [2] and [3]) that this system be solved iteratively (with no 
mention of the boundary conditions to be used); that is, we perform a Gauss-Seidel 
step on the transformed system and then perform the inverse change of variables. In 
practice we only work with the original variables (the new variables are introduced 
only to describe the method). In fact, some ad hoc modifications to the method are 
proposed in [13]; these improve the method in the presence of boundaries. 
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An obvious question is whether other changes of variables may yield a similar iter- 
ation scheme. (See [5].) The most obvious change of variables that comes to mind will 
avoid forming the Laplacian and inverse Laplacian in the equation for the pressure; it 
will therefore be given by the following distribution matrix: 


M = 


I A _1 V' 
0 I 


Formally, the inverse of the distribution matrix is 


M _1 = 


I 

.0 


— A _ 1 V‘ 
I 


Now 


LM = 


' -A 

-V- 



? 


so the change of variables is given by 


u' 


[■/ 

A -1 V] 


u' 

p. 


.0 

I j 


.P. 


ti' 


I 

— A _1 V] 


u' 

p. 


.0 

I J 


.P. 


This change of variables will yield a relaxation method which we call MDGS (modified 
DGS) relaxation. 

Thus we end up with the equations 


— Ati = f 

and 

— p = V • u . 

An obvious advantage of this method is that there are no additional boundary 
conditions which must be imposed (or, more precisely, we may impose the boundary 
condition u|an = g, and no boundary condition is needed for p). A drawback of the 
method is that it is more complicated (since the change of variables now involves an 
inverse Laplacian, although this can be approximated locally). This alternative is very 
similar to an iteration for Uzawa’s method; see [6], [20], and [21]. (See also [14] and 
[16].) 

We abandon, for the time being, any further discussion of DGS (and MDGS) relax- 
ation and consider a related alternate formulation of the Stokes problem. 


ALTERNATE FORMULATION 


We consider the following formulation for the Stokes problem: 

— Av = f, (4) 
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<1 

II 

1 

< 

< 

(5) 

and 

Vx$ = 0. 

(6) 

With boundary conditions 

v|gn + #|an = g 

(7) 

and 

$ • n|an = 0 . 

(8) 

An alternate formulation with boundary condition # x n| a n = 0 (instead of (8)) may 
be treated as well; details will appear in a forthcoming paper. 

This formulation is equivalent to the Stokes equations when we set the velocity 

u = v + $ 

(9) 

and the pressure 

p = V • # . 

(10) 

Note that if (8) is satisfied then / n V • # dx = 0, and we may in 
p = —V • v. 

fact (due to (5)) set 

Since # satisfies 

V • $ = —V • v , 

(11) 

Vx$ = 0, 

(12) 

and 

# • n| sn = 0 , 

(13) 

there exists <j) such that $ = V0; moreover, is characterized as 

the solution of 

1 

> 

II 

1 

<1 

II 

< 

< 

(14) 

and 

V(f> ■ n| an = 0 . 

(15) 

Because # = V</>, the fact that (j) (the solution of (14) and (15)) 
an additive constant does not cause any difficulties. 

is unique only up to 

In light of the above, one may replace (4)-(8) by 


— Av = f , 

(16) 

1 

t> 

II 

<1 

c 

(17) 

v| an + V(/>|an — g > 

(18) 

and 

V(j) • n|an = 0 . 

(19) 


The relationship of this formulation to DGS and MDGS is now patently clear if we 
identify 

u = u = v and p = Ap = —V • v = V • # = A <fi . 
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The advantage of this point of view is the availability of boundary conditions for 
the various unknowns. A difficulty in this approach is the fact that the equations are 
coupled through the boundary conditions; this situation is unavoidable, however (as 
observed earlier). We also have the following theorem: 

Theorem 2. The formulation (l)-(3) is equivalent to the formulation (4)-(8) and to 
the formulation ( 16)-(19 )." 


Proof: If (u ,p) e H l (Q) 3 x Ll(Q) is a solution of (l)-(3) then let $ be the unique 
solution of 

V • = p , 

Vx$ = 0, 

and 

$ • n| 9n = 0 . 

Note that V-u = 0 and A# = VV-3> (due to the fact that — A3> = Vx V x$-VV-$ 
and V x $ = 0); thus, A# = Vp. Setting 

v = u — $ , 

it is easily seen that (v, <&) satisfies (4)-(8). Conversely, if (v, #) satisfies (4)— (8), then 
set 

u = v + <3? 

and 

p = V • $ . 

Recall A<fr = Vp; clearly (u ,p) satisfies (l)-(3). 

It is well known that (5)-(6) and (8) are equivalent to (14) and (15), with the 
identification # = V</>. (See, e.g., [16].) To complete the proof we observe the following: 
if (u, p) satisfies the Stokes equations and if we set 

-A<j) = -p, 


V4> ■ n 


an 


0 . 


and 


v = u — V(j ) , 

then (v, (j)) so defined satisfies equations (16)— (19). Conversely, if (v,0) satisfies equa- 
tions (16)— (19) , set 

u = v + V0 

and 

P = A0 * 

then (u ,p) satisfies the Stokes problem (equations (l)-(3)). □ 
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WEAK FORMULATION 


Consider the following weak formulation: find v, s, and such that 
V G jy 1 (0) 3 with V • n| 3n = g • n , s G H~ 1/2 (dfl) 3 , # G #*(fi) 3 , (20) 

f {V x v • V x w + V • vV 'W + Vx$'Vxf + V' 3>V • dx 
Ja ■ ( 21 ) 

+ [ V • vV • dx + (s, w)an = (f , w) n Vw G H^(Q) 3 , \I> G If *(fl ) 3 , 

and 

(t, v + #) 9n = (t, g) 5n Vt G H~ 1/2 (dfl) 3 . (22) 

Here (-,-)n and (-,-)dn denote the duality pairing of H l { Q) 3 and (if 1 ^) 3 )* and of 
H 1 / 2 (dfl) 3 and H~ 1 ^ 2 (dQ) 3 , respectively. Or equivalently, consider the following weak 
formulation: find v, s, and $ such that 


v G H l (Q,f with v • n| 0 n = g • n , s G H n 1/2 (dQ ) 3 , 

< beH 2 (Q ) with V4>eHl(Q) 3 , 


(23) 


/J v x v • V x w + V • vV • w + A^A^} dx + J V • vA^ dx + (s, w)an 
= (f,w) n VwG^(0) 3 ,^Gff 2 (fi) with V^G^(fi) 3 , 


(24) 


and 


(t, v + V(j)) da = (t, g)an Vt G H n l/2 {dfl) 3 . (25) 

Theorem 3. Equations (20)-(22) and (23)-(25) are weak formulations for (4)-(8) 
and (16)-(19), respectively. 


Proof: Setting ^ = 0 and restricting w G ifo(fl) 3 in (21) we get that 



x v • V x w + V • vV • w} dx = (f , w)n , 


which implies that 

— Av = f 

in if _1 (fi) 3 ; letting w be an arbitrary element of H^(Q,) 3 we get that 

s = -V x v x n| 9n 

in H~ l l 2 {dfl) 3 . Now setting w = 0 and setting \I/ to be the solution of 

V\& = V# + Vv, 

Vxf = 0, 
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and 


^ • n| an = 0, 


we get that 

V • # = —V • v 

in L 2 {Q). Letting be an arbitrary element of ft^(fi) 3 we get that 

Vx$ = 0 

in L 2 (0) 3 . Finally from (20) and (22) we obtain (7). The proof for the formulation 
(23)-(25) proceeds similarly. □ 

For notational convenience, define 

A((v, $), (w, \S>)) := /'{Vxv-Vxw + V- vV • w + V x $ • V x \F} dx 

J 

+ / V • $ V • dx + f V • vV • dx , 

J n J n 

£(s,(w,^)) := (s,w)an, 

D( t, (v, $)) := (t, v + $)an , 

F((w,^)) := (f,w)n, 

G(t) := (t,g)an, 

and 

a((v, <p), (w, ip)) [ {V x v • V x w + V • vV • w + AcpA'ip} dx + / V • vAip dx , 
■Jn J n 

b( s, (w, </>)) := (s,w)an , 
d(t,(v,(p)) := (t ,v + V<£)an, 

/((w,^)) := (f,w) n , 
tf(t) := (t,g) sn . 

We denote 

ft -ft 1 ^) 3 X ^n (^) 3 

and 

H n := HUO-f x H^nf . 

On these spaces we use the usual product norm. 

With this notation we may write the weak formulations as follows: find v, s, and 
such that 

v G F 1 ^) 3 with v-n|an = g-n, s e H~ l/2 (dQ ) Z , # e ft*(fi) 3 , (26) 

A((v, «&), (w, ’3>)) + B(s, (w, <&)) = F((w, ^)) V(w, \F)eft„, (27) 
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and 


(28) 


B(t,(v,#)) = G(t) Vt € H- l / 2 (dQ) 3 . 

Equivalently, find v, s, and 0 such that 

v G H l {Q,f with v • n|an = g • n, s 6 H~ 1/2 (dQ) 3 , 

<j>eH 2 (Q) with v<f>eHl(nf , 

a((v, (t>), (w, 0)) + 6(s, (w, -0)) = /(( w, 'll))) 

Vw G , 0 e # 2 (Q) with V0 e ^(O) 3 , 

and 

d{ t, (v, 0)) = $(t) Vt G H~ 1/2 {dtt) 3 . 

Note that this weak formulation falls into the class of generalized saddle point 
problems of the type considered in [22]. (See also [14] and [23].) 

Lemma 4. The forms A{-, •), B(-, ■), D(-, ■), F(-), and G(-) are continuous; that is, 
positive constants A a, A b, A d, A p, and A G exist such that 


|H((v,$),(w,^))| < A a ||(v,#)|| w ||(w,^)||^, 

(32) 

|H(s, (w, ¥))| < A b ||s||_i/ 2 ||(w, #)||^ , 

(33) 

\D(t, (v, #))| < A D ||t||_i /2 ||(v,#)||^, 

(34) 

|E((w, #))| < A f ||(w,^)|| h , 

(35) 

l^(t)| < A G ||t||_i/ 2 . 

(36) 


(29) 

(30) 

(31) 


Proofs The proof is an easy consequence of Holder’s inequality and the definition of 
the forms. □ 

Define 


K b := {(w, T r ) G TL n : B{ s, (w, *)) = 0 Vs G H^ 2 (dQ) 3 } , 


and 

K d := {(v, #) G Un : D{ t, (v, #)) = 0 Vt G H~^ 2 (dQ) 3 } . 


Lemma 5. The forms A(-, •), B (•,•), and D(-, •) satisfy some inf-sup conditions; in 
particular, positive constants a, (3, and 8 exist such that 


inf sup 

(w (v,#)e/C0 


.A((v, #), (w, ^)) 

II (v, $)||tt||(w,«OII« 


> OL . 


(37) 


inf sup A((v, 3>), (w, *!>)) > 0 , (38) 

(v,#)gk:d\{o} (w <s>)eK B 
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(39) 


and 


inf 


B{ s,(w,*)) 

SUP TTTTi ^TTi- > P : 


s €H n l/1 {dO) (w ,¥)€H n ll s l|-l/2||(w, ^)||k 

D( t,(v,#)) 


inf 


teH;' /2 (dnj(v&)en„ ||t||-i/ 2 ||(v, 3>)||« 


> S. 


(40) 


Proof: The first condition (inequality (37)) follows from the observations that given 
(w, \I>) G Kb, setting v = w — \I> and $ = \1/ guarantees that (v, <&) G 1C D , that a 
positive constant c exists so that ||(v,4>)|]^ < c||(w, ^ r )||- w , and that 


4((v,#),(w,*)) > ||j(w,«')||5 i - 


Given (v, <&) G Kd \ {0}, set w = v + $ and \l r = <h; then, (w, 'J') G Kb ; moreover, 
it is easily seen that 

.4((v, #), (w, *)) > i (||V x v||g + ||V x *||2) + ||V • (v + #)||2 . 

Now if |(||V x v||q + || V x 3?||o) + || V • (v + <&)||§ > 0, then (38) holds. If this is not 

the case (i.e., if |(||V x v||q + ||V x #||q) + ||V • (v + $)||§ = 0), it easily follows that 
v + $ = 0, and, because (v, #) ^ 0, then V • v / 0. In this case we know (see [16]) 

that a w G Hq(Q) exists with V • w = V • v; setting ^ = 0, we get that 

A((v,#), (w,^)) > || V • v||q 

and conclude that the second condition holds. 

The third and fourth conditions (inequalities (39) and (40)) may be shown using 
the methods used in [24] to prove a similar inf-sup condition. □ 

Theorem 6. The weak problem (26)-(28) has a unique solution. 


Proof: This is a result of Lemma 4, Lemma 5, and the abstract theory detailed in [22] 
and [23]. □ 

It is an easy exercise to state, for (29)— (31) , results analogous to those stated in 
Lemma 4, Lemma 5, and Theorem 6. (Details will be given in a forthcoming paper.) 


DISCUSSION 


We point out that since the weak form of the problem falls into the class of gener- 
alized saddle point problems introduced in [22] (see also [14] and [23]), one may carry 
out finite element analysis for this problem in that framework. Such analysis yields 
existence and uniqueness results for the discrete problem (approximate problem) and 
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optimal error estimates for finite element approximation schemes based on these weak 
forms, provided that certain (discrete) inf-sup conditions hold. (Details will be given 
in a forthcoming paper.) 

An advantage of this formulation over the primitive variable (velocity-pressure) 
formulation of the problem is the fact that it is relatively easy to construct finite 
element spaces which satisfy the necessary inf-sup conditions. In fact there is complete 
freedom in choosing the spaces for v and for (the spaces that approximate H 1 (fi) 3 and 
if*( O) 3 ); in view of the error estimates it is reasonable to choose the same finite element 
space for both of these. Once these spaces have been chosen, we choose the 'space for s 
(the space approximating H~ 1 ^ 2 (dQ) ) as the restriction to the boundary of elements of 
the previous spaces (i.e., the trace space of the discrete spaces approximating ff*(D) 3 ). 
This choice for the discrete spaces guarantees that the necessary (discrete) inf-sup 
conditions are satisfied. Details and examples from computations will appear in a 
forthcoming paper. 

Another question to be investigated is the implications for multigrid codes employing 
DGS relaxation. Can these results be used in order to construct better smoothers 
(particularly in the neighborhood of boundaries)? As stated earlier, the relationship 
between this formulation and DGS relaxation is 

u = v and Ap = —V • v = V • <I> , 

but we also have that 

u = u + Vp = v + <& . 

Therefore it seems that when using DGS relaxation one alternative is to impose a 
homogeneous Neumann boundary condition on p (when solving — Ap = V • u) and the 
nonhomogeneous Dirichlet boundary condition g— Vp|an on u (when solving — Au = f). 

Moreover it may prove advantageous to keep explicit track of u and p on the bound- 
ary and use their values in the iteration. This may yield better behavior of DGS 
relaxation in the presence of boundaries. 

DGS relaxation (the change of variables described in [2] and [3]) is introduced in 
order to transform a saddle point problem into a problem which is definite. The fact 
that the new problem is still indefinite (a saddle point problem) is masked by the 
fact that the effects of the boundaries and boundary conditions have been neglected. 
Based on the previous analysis it is obvious that we are still faced with an indefinite 
problem. This must be taken into account when using this iterative scheme; one possible 
implication is that it may be advantageous to use an inexact Uzawa-type iteration to 
solve the problem. 


561 



REFERENCES 


1. Brandt, A.: Multi-Level Adaptive Solutions to Boundary- Value Problems, Math. 
Comp, vol. 31, no. 138, 1977, pp. 330-390. 

2. Brandt, A.; and Dinar, N.: Multigrid Solutions to Elliptic Flow Problems, in Nu- 
merical Methods for Partial Differential Equations, S. V. Parter ed., Academic Press, 
New York, 1979. 

3. Brandt, A.: Multigrid Techniques: 1984 Guide With Applications to Fluid Dynam- 
ics, The Weizmann Institute, Rehovot, 1984. 

4. Fuchs, L.; and Zhao, H.-S.: Solutions of Three-Dimensional Viscous Incompressible 
Flows by a Multi-Grid Method, Int. J. Num. Meth. Fluids, vol. 4, 1984, pp. 539-555. 

5. Linden, J.; Lonsdale, G.; Steckel, B.; and Stiiben, K.: Multigrid for the Steady-State 
Incompressible Navier-Stokes Equations: A Survey, Arbeitspapiere der GMD 322, 1988. 

6. Maitre, J. F.; Musy, F.; and Nigon, P.: A Fast Solver for the Stokes Equations Using 
Multigrid with a Uzawa Smoother, in Advances in Multigrid, D. Braess, W. Hackbusch, 
and U. Trottenberg eds., Friedr. Vieweg & Sohn, Braunschweig, 1985. 

7. Niestegge, A.; Witsch, K.: Analysis of a Multigrid Stokes Solver, Appl. Math. 
Comput., vol. 35, 1990, pp. 291-303. 

8. Verfurth, R.: A Combined Conjugate Gradient-Multigrid Algorithm for the Numer- 
ical Solution of the Stokes Problem, IMA J. Numer. Anal., vol. 4, 1984, pp. 441-455. 

9. Verfurth, R.: A Multilevel Algorithm for Mixed Problems, SIAM J. Numer. Anal., 
vol. 21, no. 2, 1984, pp. 264-271. 

10. Verfurth, R.: Multilevel Algorithms for Mixed Problems. II. Treatment of the 
Mini-Element, SIAM J. Numer. Anal., vol. 25, no. 2, 1988, pp. 285-293. 

11. Wittum, G.: Multi-Grid Methods for Stokes and Navier- Stokes Equations, Trans- 
forming Smoothers: Algorithms and Numerical Results, Numer. Math., vol. 54, 1989, 
pp. 546-563. 

12. Wittum, G.: On the Convergence of Multi-Grid Methods with Transforming 
Smoothers, Theory with Applications to the Navier-Stokes Equations, Numer. Math., 
vol. 57, 1990, pp. 15-38. 

13. Yavneh, I.: Multigrid techniques for Incompressible Flows, Ph.D. Thesis, The 
Weizmann Institute of Science, Rehovot, 1991. 

14. Brezzi, F.; and Fortin, M.: Mixed and Hybrid Finite Element Methods, Springer- 
Verlag, New York, 1991. 

15. Girault, V.; and Raviart, P.-A.: Finite Element Methods for Navier-Stokes Equa- 
tions, Lecture Notes in Mathematics 749, Springer- Verlag, Berlin, 1981. 



16. Girault, V.; and Raviart, P.-A.: Finite Element Methods for Navier-Stokes Equa- 
tions, Springer- Verlag, Berlin, 1986. 

17. Gunzburger, M. D.: Finite Element Methods for Viscous Incompressible Flows, 
Academic Press, Boston, 1989. 

18. Adams, R. A.: Sobolev Spaces, Academic Press, New York, 1975. 

19. Dautray, R.; and Lions, J. L.: Mathematical Analysis and Numerical Methods for 
Science and Technology, Vols. 1-5, Springer- Verlag, Berlin, 1988-92. 

20. Elman, H. C.: Multigrid and Krylov Subspace Methods for the Discrete Stokes 
Equations, Seventh Copper Mountain Conference on Multigrid Methods, NASA 
CP-3339, 1996. 

21. Elman, H. C.; and Golub, G. H.: Inexact and Preconditioned Uzawa Algorit hms for 
Saddle Point Problems, SIAM J. Numer. Anal., vol. 31, no. 6, 1994, pp. 1645-1661. 

22. Nicolaides, R. A.: Existence, Uniqueness and Approximation for Generalized Saddle 
Point Problems, SIAM J. Numer. Anal., vol. 19, no. 2, 1982, pp. 349-357. 

23. Bernardi, C.; Canuto, C.; and Maday, Y.: Generalized Inf-Sup Conditions for 
Spectral Approximation of the Stokes Problem, SIAM J. Numer. Anal., vol. 25, no. 6, 
1988, pp. 1237-1271. 

24. Gunzburger, M. D.; and Hou, S. L.: Treating Inhomogeneous Essential Boundary 
Conditions in Finite Element Methods and the Calculations of Boundary Stresses, 
SIAM J. Numer. Anal., vol. 29, no. 2, 1992, pp. 390-424. 


563 



Page intentionally left blank 


MULTIGRID ACCELERATION OF TIME-ACCURATE 
NAVIER-STOKES CALCULATIONS 


N. Duane Melson 
NASA Langley Research Center 
Hampton, VA 23681-0001 


Mark D. Sanetrik 

Analytical Services and Materials, Incorporated 
Hampton, VA 23681-0001 


SUMMARY 


A numerical scheme to solve the unsteady Navier-Stokes equations is described. The scheme is 
fully implicit in time and is unconditionally stable (at least for first- and second-order discretizations 
of the physical time derivatives). With unconditional stability, the choice of the time step is based on 
the physical phenomena to be resolved rather than limited by numerical stability. This is especially 
important for high Reynolds number viscous flows, where the spatial variation of grid cell size can 
be as much as six orders of magnitude. 

A multigrid-multiblock, steady-state, three-dimensional Navier-Stokes solver, TLNS3D, was mod- 
ified to iteratively invert the equations at each physical time step. The implementation of this procedure 
in TLNS3D is discussed. The implications of applying several popular turbulence models to unsteady 
flow are also considered. Numerical results are presented to show the application of the scheme to 
various two-dimensional turbulent flows. The results of a three-dimensional laminar flow calculation 
are also given. 


INTRODUCTION 


Although significant progress has been made in the last twenty years to numerically model many 
physical situations, most numerical schemes are limited to the prediction of steady flows. This 
limitation is particularly true in the field of computational fluid dynamics (CFD), where solutions to 
the Navier-Stokes equations for steady flows are now calculated on a regular basis. (See, for example, 
references [1-3].) An important factor that has lead to the increased use of Navier-Stokes solvers is the 
recent success in reducing the computer resources necessary to obtain converged solutions. Perhaps 
the most promising work has been in the use of multigrid acceleration techniques. Convergence to 
steady state has been shown in 0[log(n)] work, where n represents the number of unknowns to be 
solved. This reduction in computer requirements has made steady-state solutions affordable to the 
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practicing engineer. 

However, many physical phenomena (e.g., separated flows, wake flows, buffet) are intrinsically 
unsteady. The solution of unsteady problems in CFD has been limited to simplified subsets of the 
Navier-Stokes equations (panel methods, potential-flow solvers, and some limited use of Euler equation 
solvers). Unsteady Navier-Stokes calculations have been too expensive for routine use. 

The present approach is to apply an iterative procedure for the solution of an implicit equation; 
thus, the approach is called an iterative-implicit method. The concept is not new; in fact, many of 
the methods developed in the field of linear algebra for inverting large matrices are .iterative. Within 
the field of CFD, similar work is discussed by Jameson [4] for unsteady flows and by Taylor, Ng, 
and Walters [5] for steady-state flows. The present approach is similar to that of Jameson in that 
a Runge-Kutta-based multigrid method is used to solve the implicit unsteady flow equations. The 
Navier-Stokes equations have been treated in the present work, and Jameson’s implementation has 
been modified so that the robustness of the scheme is dramatically increased. Later work by Belov, 
Martinelli, and Jameson [6] has incorporated the modifications used in the present work as given 
below and in reference [7]. 

A summary description of the implementation is given below. (Details of the implementation and 
analysis of the method are given in a previous paper [7].) A discussion of the use of current ‘steady’ 
turbulence models is then given. Numerical results from laminar and turbulent two-dimensional test 
problems are then presented, as well as the results from a three-dimensional laminar calculation. 


GOVERNING EQUATIONS 


In the present work, a modified version of the thin-layer Navier-Stokes (TLNS) equations is used 
to model the flow. The equation set is obtained from the complete Reynolds-averaged Navier-Stokes 
equations by retaining only the viscous diffusion terms normal to the solid surfaces. For a body-fitted 
coordinate system (£, i), () fixed in time, these equations can be written in the conservation-law form as 


5 , j v dF dG dH dF v dG v dH v 
dT [ > <9£ dr) + d( d£ dr) dC 

where U represents the conserved variable vector and F, G, and H represent the convective flux 
vectors. In the above equation set F v , G v , and H v represent the viscous flux vectors in the three 
coordinate directions (£, rj, (), and J is the Jacobian of the transformation. These equations represent 
a more general form of the classical thin-layer equations introduced in reference number [8] because 
the diffusion terms in all three coordinate directions are included in this form. The Euler equations can 
easily be recovered from equation (1) by simply dropping the last three terms on the right-hand side. 
The effects of turbulence are modeled through an eddy-viscosity hypothesis. The Baldwin-Lomax 
[8J, Spalart-Allmaras [9], and Menter shear-stress transport [10] turbulence models are currently 
implemented to provide turbulence closure. 
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The temporal derivatives are cast as a fully implicit operator in physical time. For first- or 
second-order discretizations in time, this produces an unconditionally stable scheme, which allows 
the time-step size to be chosen based on the temporal resolution needed in the solution rather than 
limited by the numerical stability requirements. The fully implicit terms are iteratively solved with 
multigrid acceleration rather than direct inversion, which would be too costly for the nonlinear three- 
dimensional Navier-Stokes equations. 


IMPLEMENTATION OF TIME-DEPENDENT METHOD 


Original TLNS3D Method 


In the original TLNS3D program, a semidiscrete cell-centered finite-volume algorithm, based on 
a Runge-Kutta time-stepping scheme [1, 11, 12], is used to obtain the steady-state solutions to the 
TLNS equations. A linear fourth-difference-based and nonlinear second-difference-based artificial 
dissipation is added to suppress both the odd-even decoupling and the oscillations in the vicinity of 
shock waves and stagnation points, respectively. Both the scalar and matrix forms of the artificial 
dissipation models [13] are incorporated. 

In the steady -state implementation, the physical time T is replaced by a pseudo time r, which gives 


5 / _ dF dG dH dF v d G v dH v 

dr ' ' dt ; drj d( d£ drj <9( 

At steady state, the left-hand side of equation (2) disappears, and the right-hand side (the residual) 
goes to zero, so that any stable scheme may be used to advance the solution in pseudo time. 

In the original TLNS3D program, the solution is advanced with a five-stage Runge-Kutta time- 
stepping scheme. Three evaluations of the artificial dissipation terms (computed at the odd stages) 
are used to obtain a larger parabolic stability bound, which allows a higher. CFL number in the 
presence of physical viscous diffusion terms. Such a scheme is computationally efficient for solving 
both the steady Navier-Stokes and the steady Euler equations. The stability range of the numerical 
scheme is further increased with the use of an implicit residual smoothing technique that employs grid 
aspect-ratio-dependent coefficients [1, 14, 15]. 

The solution is advanced in pseudo time with the maximum allowable time step for each cell. 
The efficiency of the steady numerical scheme is also significantly enhanced through the use of 
a multigrid acceleration technique as described in reference [1]. The original TLNS3D program 
was extensively modified to facilitate solution of the flow fields over a wide range of geometric 
configurations through domain decomposition. This multiblock version of TLNS3D is referred to as 
TLNS3D-MB. A consequence of this work is the generalization of the boundary conditions of the 
program to easily accommodate any arbitrary grid topology. A detailed description of this capability 
is given in reference [16]. 
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Time-dependent TLNS3D-MB 


In the steady-state version of TLNS3D-MB, the following multistage Runge-Kutta scheme is used 
to solve (2): 


W(°) _ y/m 


w (k) = w (0) + akAr j-l 


C^-'Xw) - Dp k \w) - d\ K} {W) + F^-^iW) 


(*), 


( 3 ) 


y/rn+l _ yy(K) , 


where W is the solution vector for the discrete formulation, m is the counter for the Runge-Kutta 
iterations, (k) is the k th of K Runge-Kutta stages, is the coefficient for the k th Runge-Kutta stage, 
C is the convective operator (evaluated at the previous Runge-Kutta stage), D v and D a are the physical 
and artificial dissipation operators (evaluated at a linear combination of previous Runge-Kutta stages), 
and F is the multigrid forcing function. The above solution procedure can be thought of as placing 
the equation to be solved (in this case the steady-state Navier-Stokes equations) on the right-hand side 
of the equation and adding a pseudo-time term on the left-hand side. (See equation (2).) The same 
type of procedure is used in the time-accurate version of TLNS3D-MB. In this case, however, the 
unsteady Navier-Stokes equations are placed on the right-hand side: 


d 

dr 


d 




dF dG dH 8F V 0G V dH v 


dt, + dr) + d( 


di dr] d( 


( 4 ) 


The physical time derivative is then approximated as a finite difference, and the same type of Runge- 
Kutta scheme is used to advance the solution in pseudo-time: 

W(°) = w m 


aj.Ar J" 


W ^ k ) = W^+ 

- D^\w) - D^\w) + F^ k ~ 1 \W) 


1 - w n 


( 5 ) 


J " 1 


At 


W m+ 1 = W^ K \ 

where n is the physical time step counter. Note that for simplicity the physical time derivative has 
been written as a first-order derivative; a higher order discretization can be used if more accuracy is 
desired. Also note that all terms in (5), except for the second term in the physical time derivative, 
are evaluated at the new physical time level n + 1. 
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Equation (5) cannot be solved directly because the term appears on both sides of the 

equation. Solving (5) for W gives 


(l+a k X)W {k) = 
W (0) + a k A tJ ~ 1 


C {k ~ 1} {W) - D^ k \w) - D^\w ) + F ( ' k ~ 1 \W) + 


1 W n ' 
J- 1 At 


( 6 ) 


where A is the ratio of pseudo and physical times However, (6) also is unacceptable because 
the right-hand side does not go to zero as the Runge-Kutta iteration converges. The final form for 
the k th Runge-Kutta stage for the time-dependent version of TLNS3D-MB is obtained by adding and 
subtracting the term ajr.AW^' -1 ^ to the right-hand side of the equation: 


(1 + a k X)W {k) = W {0] + a k .XW {k - 1} + 


a k ArJ 


-l 


C ik - 1] {W) - D^\w) - d\ K] {W) + F {k ~ l \W) 


(*), 


1 w(fc-i) _ w n 


(7) 


J 


-l 


At 


For second-order discretization of the physical time derivative, this becomes: 


1 + ^a k \^W {k) = W (0) + a k XW {k - 1] + 


a k ArJ 


-l 


C {k ~ 1] {W) - D^'\w) - D K a ] {W) + F^-^iW) 


( k ) l 


1 3W( k ~V -AW n + W 11 - 1 ' 
2A t 


( 8 ) 


The Baldwin-Lomax turbulence model is considered a zero-equation turbulence model and is 
implemented as part of the solution of the Navier-Stokes equations. The one- and two-equation 
turbulence models are implemented such that their solution is decoupled from the Navier-Stokes 
equations. They do not contain physical time derivatives and are not treated in a time-accurate manner. 
From a heuristic standpoint, they can be considered frozen in time. The results presented below indicate 
that this is an acceptable implementation for the class of problems considered. Subsequent work [17] 
has indicated that the physical time derivatives should be included in the turbulence model to insure 
accuracy for a wide range of flows. 


NUMERICAL RESULTS 


To demonstrate the capability of the present method, the results of several numerical experiments 
are given. The first case that is examined is the unsteady flow over a two-dimensional circular cylinder 
with a Reynolds number of 3000 and a free-stream Mach number of 0.2. If the flow about the cylinder 
is impulsively started, the initial flow is symmetric with zero lift as the wake behind the cylinder begins 
to grow. As the wake continues to grow, it becomes unstable and begins to shed from alternate sides 
of the cylinder. This shedding is periodic in nature and is characterized by the Strouhal number. The 
experimentally obtained value of the Strouhal number for the above conditions is 0.21. 
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The present scheme was used to calculate the fully developed vortex shedding flow around the 
cylinder. Two different grids were generated for the calculations; a fine grid with 257 x 129 points 
around and normal to the cylinder, respectively, and a coarse grid generated by deleting every other 
point from the fine grid. The fine grid was generated using an algebraic method with simple power 
law stretching. The normal spacing at the cylinder for the fine grid was 0.0001 times the diameter of 
the cylinder, and the grid extended to 20 diameters from the center of the cylinder. The coarse grid 
had a normal spacing of 0.0002 with the same outer boundary. Points were clustered in the wake 
region for better resolution, as shown for the coarse grid in figure 1 . Results were obtained for two 
time step sizes for both the Baldwin-Lomax and Spalart-Allmaras turbulence models. Second-order 
discretization of physical time was used for all unsteady calculations. The larger, nondimensional 
time step size of 0.4 gave approximately 50 time steps per period. The smaller time step of 0.2 gave 
approximately 100 steps per cycle. The predicted Strouhal number for each combination of grid, time 
step, and turbulence model is presented in table 1 . The percent difference from the experimental value 
is given in parentheses. As would be expected for separated flow, the Spalart-Allmaras turbulence 
model produced more accurate results for each grid/time step combination. Time histories of the lift 
coefficient Q are shown in figures 2 and 3 for the Baldwin-Lomax and Spalart-Allmaras turbulence 
models. The small effect of the reduction of the time step size indicates that the larger time step (with 
50 time steps per cycle) is adequate to predict the Strouhal number. The difference in the results due 
to the change in grid spacing is much larger than the effect due to changes in the time step size. 


Table 1. Predicted Strouhal Number for Circular Cylinder (Mqo = 0.2, Red = 3000) 



Baldwin-Lomax 

Spalart-Allmaras 

coarse grid 

fine grid 

coarse grid 

fine grid 

o 

o 

II 

<1 

0.197 (6.2%) 

0.201 (4.3%) 

0.211 (4.8%) 

0.207 (1.4%) 

At = 0.20 

0.198 (5.7%) 

0.202 (3.8%) 

0.219 (4.3%) 

0.208 (1.0%) 


The second configuration considered was a two-dimensional rectangular cavity in a flat plate. To 
model a configuration tested experimentally [18], a cavity length of 3.0 inches and height of 0.5 inches 
were considered. The flat plate extended 10.4 inches upstream of the cavity. This gives a length to 
height ratio (L/H) of 6. A free-stream Mach number of 0.3 and a Reynolds number of 300,000/inch 
were used. A transition grit was applied near the plate leading edge to force the boundary layer to 
transition to a turbulent boundary layer and for these conditions, no tones were generated and the 
flow was nearly steady. 

A nonreflecting boundary condition was applied at the inflow boundary 21.6 inches ahead of the 
cavity. (See figure 4.) The upper computational boundary was set at 10 inches above the plate where 
a nonreflecting boundary condition was applied. An extrapolation boundary condition was applied at 
the outflow boundary 39.1 inches aft of the cavity. An algebraic grid generation technique was used 
to generate a two-block grid with 49 x 56 points in the cavity and 129 x 49 points above the cavity 
and flat plate. Power law stretching was used to cluster points near the flat plate and the cavity walls 




















and floor with a spacing of 0.005 inches. A cosine function was used to transition from the clustered 
grid near the surface to a specified fraction of uniform spacing near the far boundaries. (See figure 5.) 

To obtain reasonable starting conditions, TLNS3D-MB was run in steady mode (pseudo-time 
marching). After a reasonable number of multigrid cycles, the calculation was stopped and then 
restarted in unsteady mode with second-order physical time discretization. It has been found that this 
is an effective method for starting unsteady calculations. The lift histories for a laminar calculation 
and turbulent calculations using the Baldwin-Lomax, Spalart-Allmaras, and Menter models are shown 
in figure 6. Note that the laminar results exhibit periodic behavior, while the turbulent results appear 
to approach a steady solution. The turbulent cases were all started from an unsteady laminar solution 
to try to force oscillations, but all models showed a damping of the oscillations. Detailed examination 
of the solution shows small oscillations, but the predicted flow is essentially steady. This result is in 
line with experimental observations of the differences between laminar and turbulent flows in cavities 
[19-21], The topology of the flow field in the cavity predicted by the turbulent runs is characterized 
by a large recirculation region that fills most of the cavity. Small secondary vortices are also present 
in the lower comers of the cavity. A sample of this is shown in figure 7. The topology of the laminar 
solution is very different. Multiple nonstationary vortices appear in the cavity and then either die out 
or are convected out of the cavity. Streaklines at various times are shown in figures 8 and 9. 

The computed pressure coefficient along the centerline of the floor of the cavity from the present 
turbulent calculations is compared with experimental values in figure 10. Once again, the agreement 
for the Baldwin-Lomax model is not as good as for the one- or two-equation turbulent models for 
separated flow. None of the models predicts the high pressure at the rear of the cavity as seen in the 
experimental data. This result may be due to three-dimensional effects in the experiment. 

To demonstrate the capability of the present method to calculate three-dimensional flows, a three- 
dimensional laminar calculation was performed for the same L/H = 6 cavity with a width to height 
ratio (W/H) of 5. The surface grid and a portion of the outer boundary for this calculation are 
shown in figure 11. The two-dimensional grid shown previously is the grid from the cavity centerline 
plane from this three-dimensional grid. The lift and drag (based on integrated pressures) histories 
of this calculation are shown in figure 12. The flow exhibits the same unsteady properties that the 
two-dimensional laminar calculation contained, although large three-dimensional effects are apparent, 
as evidenced by the streaklines for a selected time shown in figure 13. This calculation required 
approximately 50 CPU hours on a Cray C-90. 


CONCLUSIONS 


A method to accurately calculate solutions to the unsteady Navier-Stokes equations has been 
presented. Multigrid acceleration has been successfully employed to accelerate the calculations of the 
iterative-implicit method. Examples for two-dimensional turbulent flow past a circular cylinder and 
a rectangular cavity, using the Baldwin-Lomax, Spalart-Allmaras, and Menter shear-stress transport 
models, have been presented to show that a frozen implementation of these ‘steady’ turbulence models 



can give good results for these unsteady separated flows. The time-dependent scheme has also been 
demonstrated for a three-dimensional laminar calculation. 
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Figure 1. Coarse cylinder grid (129 x 65). 


573 


2.0 


- 3.0 



0.0 


50.0 


100.0 


time 


150.0 


Figure 2. Lift history for circular cylinder with 
Baldwin-Lomax turbulence model (Moo=0.2, ReD=3000). 
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Figure 3. Lift history for circular cylinder with 
Spalart-Allmaras turbulence model (Moo =0.2, Reo=3000). 
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Figure 4. Schematic of two-dimensional rectangular cavity computational domain. 



Figure 5. Grid for two-dimensional rectangular cavity calculations (L/H=6). 
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Figure 7. Sample streaklines for turbulent (Spalart-Allmaras) calculation 
of two-dimensional cavity (L/H=6, Moo =0.4, Re=300, 000/inch). 
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Figure 8. Sample streaklines at T=109.5 for laminar calculation 
of two-dimensional cavity (L/H=6, Moo =0.4, Re=300, 000/inch). 



Figure 9. Sample streaklines at T= 120.75 for laminar calculation 
of two-dimensional cavity (L/H=6, Moo=0.4, Re=300, 000/inch). 
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Figure 10. Pressure coefficient along cavity floor for two-dimensional 
rectangular cavity (L/H=6, Mqo=0.4, Re=300, 000/inch). 



Figure 11. Surface grid for three-dimensional rectangular cavity calculations (L/H=6, W/H=5). 
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Figure 13. Sample streaklines for laminar calculation of 
three-dimensional cavity (L/H=6, W/H=5, Mqo=0.4, Re=300, 000/inch). 
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INTRODUCTION 


In this paper we consider the simultaneous flow of oil and water in reservoir rock. This displacement 
process is modeled by two basic equations (see, e.g., [1]): the material balance or continuity equations 
and the equation of motion (Darcy’s law). For the numerical solution of this system of nonlinear 
partial differential equations there are two approaches: the fully implicit or simultaneous solution 
method and the sequential solution method. 

In the sequential solution method the system of partial differential equations is manipulated to 
give an elliptic pressure equation and a hyperbolic (or parabolic) saturation equation. In the IMPES 
approach the pressure equation is first solved, using values for the saturation from the previous time 
level. Next the saturations are updated by some explicit time stepping method; this implies that 
the method is only conditionally stable. For the numerical solution of the linear, elliptic pressure 
equation multigrid methods have become an accepted technique. (See, e.g., [2], [3], [4].) 

On the other hand, the fully implicit method is unconditionally stable, but it has the disadvantage 
that in every time step a large system of nonlinear algebraic equations has to be solved. The most 
time-consuming part of any fully implicit reservoir simulator is the solution of this large system of 
equations. Usually this is done by Newton’s method. The resulting systems of linear equations are 
then either solved by a direct method or by some conjugate gradient type method. 

In this paper we consider the possibility of applying multigrid methods for the iterative solution 
of the systems of nonlinear equations. There are two ways of using multigrid for this job: either 
we use a nonlinear multigrid method or we use a linear multigrid method to deal with the linear 
systems that arise in Newton’s method. So far only a few authors have reported on the use of 
multigrid methods for fully implicit simulations. In [5] a two-level FAS algorithm is presented for 
the black-oil equations, and linear multigrid for two-phase flow problems with strong heterogeneities 
and anisotropies is studied in [6]. Here we consider both possibilities. Moreover we present a 
novel way for constructing the coarse grid correction operator in linear multigrid algorithms. This 
approach has the advantage in that it preserves the sparsity pattern of the fine grid matrix and it 
can be extended to systems of equations in a straightforward manner. We compare the linear and 
nonlinear multigrid algorithms by means of a numerical experiment. 

EQUATIONS 


In the absence of gravity forces the volumetric flow rate of water and oil in a porous medium is given 
by the generalized Darcy’s law 

5 a = -A a VP a , a = w,o, (1) 

where q a , X a , and P a are the Darcy velocity, the mobility, and the pressure of phase a, respectively. 
The saturation of phase a is denoted by S a , so 

S w + S 0 = 1. (2) 
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The phase mobilities A Q are defined by 


Ac 



a = w,o, 


( 3 ) 


where k is the rock permeability, k a (S a ) is the phase relative permeability, and p a is the phase 
viscosity. In addition to these momentum equations we have mass conservation laws for both phases: 

d S 

' ^-dt~ + V ’ qa + = a = VJ '°< ( 4 ) 

where <f> is the porosity of the rock and Q a is the production rate of phase a. The phase pressures 
P a are related through the capillary pressure P e : 


Pc(S w ) = P 0 - P w . (5) 

The equations (l)-(5) are the partial differential equations that make up the incompressible two- 
phase flow model. In the sequel we use S w and P a as the independent variables and drop the 
subscripts. 

We still have to specify the boundary conditions. Usually the flow across well boundaries is 
modeled by point sources and sinks, and no flow boundary conditions are imposed at the boundary 
of the reservoir. This has the effect of shifting all complications to a proper modeling of the injection 
and production wells. 


DISCRETIZATION 


In this section we describe the fully implicit discretization of the multiphase flow equations. For ease 
of notation we assume a uniform porosity <f> and rock permeability k. Moreover we only consider 
the two-dimensional case with a uniform Cartesian grid. The equations are discretized in space by 
a finite volume scheme (cell-centered finite-differences or box scheme). For the time integration the 
backward Euler method is used. This leads to the system of equations 




i ((«-)?■& - + (9-C+* - = 0. 

~(SZt 1 -Sl j ) + (Q 0 )l+ 1 + 

l ((«.&& - + (*«* - (*«£*) = 0 . 


( 6 ) 


( 7 ) 


In the above, h denotes the mesh width; the subscripts i, j, the discretization cell; and the superscript 
n, the time level. The fluxes at the edges between cells are approximated with upstream weighted 
mobilities. For example, the fluxes (q a )™+i j at the edge between the cells i,j and i, j + 1 are 
approximated by 






pn + l _ ( p \n + l _ pn + 1 , j p \n + l 
M \n + l -*1 + 1, j + r i,j ’ \ 

h i 

pn + 1 p^ + 1 

l\ l n + 1 t+bj i,3 


( 8 ) 

( 9 ) 


with 





r +i ) 

Ma ’ 


if (P^H 1 < 0, 

if C^)^. - > 0. 


( 10 ) 


In the case of nonuniform rock permeability k, the permeability k. +1 at the cell edge is the harmonic 
average of the values in the adjacent cells. 
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MULTIGRID 


In each time step we have to solve the large system of nonlinear equations (6)-(10). We consider 
cell-centered multigrid methods for the iterative solution of these systems. In cell-centered multigrid 
methods the coarser grids G 2h , G 4h , ■ ■ ■ are constructed by successively doubling the mesh width 
of the fine grid G h . Hence, each coarse grid cell is the union of four fine grid cells. In this paper 

we focus on the coarse grid correction. Suppose that on the fine grid G h we have the system of 

equations 

Af h {u h ) = f h , (11) 

where S/ h is a possibly nonlinear operator. The coarse grid corrections that we consider- are of the 
form 

fj 2h (u 2h ) = Af 2h (u 2h ) +R h 2h (f h -M h {u h )), (12) 

u h = u h + Pl h {u 2h - u 2h ), (13) 

where R^ h denotes the restriction that is the adjoint of the interpolation by a piecewise constant 

function. In the cell centered multigrid method this is natural: the residual (the total excess of 
accumulation and net flow) in a coarse grid cell is the sum of the residuals in the corresponding 
four fine grid cells. The prolongation P£ h is the piecewise bilinear interpolation. This combination 
of prolongation and restriction is formally sufficiently accurate to deal with second order partial 
differential equations. 

We will now develop two multigrid methods for (6)-(10). In the nonlinear multigrid method (the 
FAS algorithm [7]) we deal with this nonlinear system of equations directly, so M h is a nonlinear 
operator. On the other hand, in the linear multigrid method Af h is the Jacobian matrix of the system 
of nonlinear equations. We present a novel way to construct the coarse grid correction operator for 
the linear multigrid algorithm. 


Nonlinear Multigrid 

The nonlinear multigrid method that we use is the FAS algorithm. To obtain the coarse grid 
operator M 2h the problem is discretized on the coarse grid (i.e. , a grid with mesh size 2 h). There 
are only homogeneous boundary conditions; therefore, the treatment of the boundary conditions on 
the coarse grids is trivial. If there is a well in a grid cell on the fine grid, then it is also present 
in all father cells on coarser grids. Because the problem is nonlinear, the properties of the coarse 
grid operators are determined by the choice of u 2h . Here we take u 2h = R\ h u h , where R 2h is the 
interpolation by piecewise constants. 

We use a collective point Gauss-Seidel-Newton method as the smoother in this multigrid algo- 
rithm. This means that all cells are visited in some predetermined order, and equations (6) and 
(7) are solved simultaneously for the variables related to that cell. This system of two nonlinear 
equations is solved by Newton’s method. 


Linear Multigrid 

We can also use multigrid to solve the linear systems of equations that occur when applying Newton’s 
method on the fine grid G h . Let us again consider the construction of the coarse grid linear operator 
that is used in the coarse grid correction. Basically there are two ways to define this coarse grid 
operator. Given prolongation and restriction operators we can define the coarse grid operator as the 
Galerkin approximation to the fine grid operator; this is done in [6]. This approach is straightforward 
but it has a disadvantage in that for simple linear elliptic equations the coarse grid matrix may loose 
the M-matrix property. Moreover, the stencils of the coarse grid operators are often denser than 
the corresponding fine grid stencil (cf. [8]). The alternative approach is to discretize the problem 
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on the coarse grid as in the nonlinear multigrid algorithm and to use the Jacobian of the nonlinear 
coarse grid operator as the coarse grid operator for the linear multigrid algorithm. An advantage to 
this approach is that all nice properties of the fine grid operator are immediately carried over to the 
operators on the coarser grids. We now try to combine these approaches; the coarse grid operator 
is defined by means of a Galerkin-like construction that is based on the coarse grid discretization 
approach. 


To explain this construction we consider a simple one-dimensional, second-order scalar conser- 
vation law 

£-'■ < 14) 


where q is some function of the solution u and ^ . A simple finite volume discretization on the fine 
grid G h with uniform mesh width h leads to a system of equations of the form 


«i+i 


■rf- 


htf. 


(15) 


with 


( U i > U i + 1’ M - 


(16) 


Suppose that we use Newton’s method on the grid G h . In a single iteration step we then solve the 
following problem: find Auj 1 such that 



hfi - (g t >i - - Mi+i - i. 

(17) 

with 

dnf ‘ + 3«J t i ' +1 ' 

(18) 

This can be written 

in matrix form: 



J h Au h - f h . 

(19) 

For example, let us 

consider the linear convection-diffusion equation 



d ( diA 

(20) 


dj(“ +e d;) = 0 ' 


with boundary conditions u(0) = 0 and ii(l) = 1. A forward discretization for the convective term 
yields 


9?+i («?.«?+!. A) 


X i + 1 


4-6' 


H+i 


( 21 ) 


If we use discretization on the coarse 

by 


grid G 2h to define the coarse grid operator, its stencil is given 




( 22 ) 


Interpolation by piecewise constants, which is the natural choice for prolongation and restriction in 
multigrid algorithms for finite volume schemes, is of course insufficiently accurate for this second 
order problem. However, if we construct the coarse grid operator as the Galerkin approximation 
using these natural transfer operators, we obtain the coarse grid stencil 



(23) 


Clearly the treatment of the second order diffusion term is different for the finite volume discretization 
approach (22) and the Galerkin approximation (23). 

We compare the efficiency of these two methods by means of a simple numerical experiment. 
We take the convection-diffusion equation (21) with e = 0.01. In both cases we use a restriction 
that is the transpose of piecewise constant interpolation and a prolongation by a piecewise linear 
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h 

h 

2e 

Galerkin 

FVD 

v = 1 

II 

V- 1 

C4 

II 

1/8 

6.25 

0.60 

0.38 

0.53 

0.36 

1/16 

3.12 

0.58 

0.42 

0.54 

0.37 

1/32 

1.56 

0.55 

0.44 

0.50 

0.35 

1/64 

0.78 

0.54 

0.45 

0.47 

0.31 

1/128 

0.39 

0.54 

0.47 

0.45 

0.32 

1/256 

0.20 

0.53 

0.47 

0.42 

0.30 

1/512 

0.10 

0.53 

0.47 

0.43 

0.28 


Table 1: Two-level convergence rates for the linear convection-diffusion equation with two different 
coarse grid operators: the Galerkin approximation and Finite Volume Discretization. 


function. For smoothing we apply damped Jacobi relaxation with a damping factor of 2/3. We 
do not use a Gauss-Seidel smoother because it is an exact solver for the pure convection equation. 
Therefore, it is not suitable for comparing the merits of the different coarse grid correction operators 
in the convection dominated case. Table 1 shows the observed two-level convergence rates for 
both algorithms with one (v = 1) and two (u = 2) smoothing steps. If the mesh Peclet number 
h/2e is greater than 1 (convection dominates), then the convergence rates are comparable for both 
algorithms. Applying two smoothing steps improves the convergence rates, so low frequency error 
components are indeed reduced efficiently in the coarse grid correction. However, when diffusion 
dominates, the two-grid algorithm with the Galerkin approximation performs worse than the coarse 
grid discretization approach. Applying two smoothing sweeps hardly improves the convergence rate 
of the algorithm with the Galerkin approximation. As the grid interpolation operators used in its 
construction are too inaccurate, the coarse grid correction is incorrect. 

Comparing the coarse grid stencils (22) and (23) suggests another approach for the construction 
of the coarse grid matrix (cf. [8]). Let us assume that we can split the derivatives in terms with 
different order behavior with respect to the mesh size h: 


( u i ! u iVl! M 


du } 


— l U i+li fy — ,U ’ + 


P=0,1 




du i+l 


J9 1+1 .« 1+1 («?• U i+ 1> h ) = ( U i’ U i+ 1> h ) 


p—0, 1 


(24) 

(25) 


with 

jq j ,«.(■“ — hAu, u + h&u, h) = 0(h~ p ) for h — > 0. (26) 

For the forward discretization (20) of the linear convection-diffusion equation this leads to a splitting 
in convective and diffusive terms: 



0 , 





Let the matrix J h,p consist of the elements j p ,, , so 


(27) 


jh = J h ' p . (28) 

p=0,l 

We define the coarse grid operator now as follows: 
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where P£ h and R^h are the interpolation by piecewise constants. For the example of the linear 
convection-diffusion equation this yields exactly the same coarse grid operator as the one obtained 
by discretization on the coarse grid (cf. (22)). 

We use this approach for defining the coarse grid operator also in the case of a system of conser- 
vation laws. For our two-phase flow model the fluxes are given by (8), (9), and (10). With obvious 
abuse of notation we define the splitting as follows: 


,-0 

Ja,P. . 

= >0, a = w,o, 


Ja, p i j 

= +(A a )i + ij-£, a- 

w,o, 

l° 

<9(A w)i + ij P i+lij 


3 w , s , j 


h 


- (\) 1 dPc 
~ ^ w) i+bihdS i<j ' 


*0 

P i+i,j - 

~ p i,i 

3 o,s . . 

ds itj h 

» 

Jo.S . . 

= 0. 



(30) 

(31) 

(32) 

(33) 

(34) 

(35) 


The accumulation terms are of course treated as zero order terms. 

We notice that the implementation of (29) is simple due to the fact that we are using piecewise 
constant grid interpolation operators. The entries of the fine grid matrix consist of terms related to 
either cells (the accumulation terms) or to edges (the flux terms). The coarse grid matrices have the 
same structure, where the coarse grid cells consist of four fine grid cells; the coarse grid edges consist 
of two fine grid edges. Because we are using piecewise constant interpolation operators, (29) implies 
that we can simply add the terms related to cells on the finest grid to the corresponding terms in 
parent cells on all coarser grids. Next we calculate the flux terms j p s and j p a p . Each of these 
terms can be associated with a unique edge between two cells. As we are using piecewise constant 
interpolation operators and as the terms s and p appear with opposite sign in the linearized 
discrete equations for the two cells (cf. (6) and (7)), it follows that these terms do not contribute 
to the coarse grid matrix if the fine grid edge is not part of a coarse grid edge. However if the fine 
grid edge is part of a coarse grid edge, we add that coefficient, multiplied by the appropriate scaling 
factor, to the coefficient at the parent edge. This is done recursively until we end at the coarsest 
grid. The splitting in terms related to cells and edges thus yields a straightforward implementation 
of (29). 


NUMERICAL RESULTS 

In this section we show some results for the numerical simulation of the flooding of a typical labora- 
tory scale model. This problem is taken essentially from [9]. The model consists of a thin sand pack 
simulating a quadrant of an infinitely repeating five-spot. Some properties of the model are shown 
in Table 2. The model is placed horizontally, so the gravity effect can be neglected. Initially there 
is a uniform saturation Si in the model. Water is then injected into one corner of the pack at a 
constant rate and oil and water are produced at the opposite corner. Several cases are considered 
with widely varying oil- water viscosity ratios M = {j. 0 /hw (See Table 3.) For these data the flow is 
convection dominated, so steep gradients develop in the water saturation S w . Because the transition 
regions cannot be resolved on the coarser grids, this is an interesting test problem for the multigrid 
algorithms. The functions k a (S) and P C (S) are smooth functions and good approximations to the 
data given in [9]: 

M5) = ’ (36) 
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(37) 


MS') = 0.67 , 

Pc (S) = 0 ff- 'j 62.3 x 10 3 [dyne/cm 2 ]. (38) 

For the discretization of this problem we use several grids. The coarsest grid in all calculations is 
a 5 x 5 grid, and the fine grid contains 80 x 80 grid points, so the total number of unknowns for 
the fine grid is 12800. The calculation is stopped when three times the total pore volume has been 
injected. 

In all time steps the discrete problem is solved with a tolerance r < 1 x 10~ 3 , where r denotes 
the ^2-norm of the residual scaled by the inflow 5, -At in that time step. The total oil balance error 
([initial — final oil in place] /cumulative oil production) is always less than 2 x 10 -4 . The time steps 
A t n are selected in order to have changes in the saturation of approximately 0.05: 


At 


n-f 1 


0.05 




rn-l| 


-At”. 


(39) 


The ratio At n+1 / At n is bounded between 0.5 and 2.0. 

In Figure 1 the numerical approximation of the water saturation after injection of 0.25 times 
the total pore volume is plotted for test problems 1 and 2. In test problem 1 there is a favorable 
mobility ratio M, and the water displaces the oil in a piston-like manner. However, in test problem 
2 we have an unfavorable viscosity ratio. The water saturation at the shock front is now lower 
than in the previous case, and the water breakthrough occurs earlier. This is in agreement with 
the classical one-dimensional Buckley-Leverett theory. Figure 2 shows the volume of produced oil 
versus the volume of injected water expressed in pore volumes. These results are obtained on the 
80 x 80 grid. These production curves are (of course) in good agreement with the results presented 
in [9]. As expected from the Buckley-Leverett theory a large mobility ratio M leads to an inefficient 
oil recovery process. 

For our purposes, the convergence speeds of the two multigrid algorithms that we are considering 
are more interesting. To estimate the convergence speed of the nonlinear multigrid algorithm, we use 
the average residual factor pnmg ■ Here we take the ^2- norm of the residual of the nonlinear discrete 
equations ((6) and (7)). The convergence speed of the linear multigrid algorithm is estimated by 
the average residual factor Plmg . which uses the £ 2 -norm of the residual of the linear equations 
in Newton’s method. In all runs we used F-cycles with a single smoothing step for pre- and post- 
smoothing. Because the flow is basically from the injection corner toward the production corner, 
a single Gauss-Seidel sweep suffices. In more complicated situations a four direction Gauss-Seidel 
method has to be used. 

In Table 4 we show the estimated convergence speeds p on different fine grids for the different 
test cases. In all cases both multigrid algorithms perform satisfactorily; we observe a fast, grid- 
independent convergence behavior. The average residual reduction factor p is always less than 0.15. 
In the nonlinear multigrid algorithm we find that typically three or four F-cycles are needed to satisfy 
the stopping criterion. In the linear multigrid algorithm typically two Newton steps are needed for 
convergence; altogether, typically four F-cycles per time step are needed. In Table 5 the average 
execution times on a HP-735 work-station, are shown. Although our code is far from optimal, two 
tentative conclusions can be drawn from it. First, both algorithms show optimal complexity; the 
time needed per time step and per grid point is independent of the number of grid points. Second, 
the linear multigrid algorithm is more efficient than the nonlinear one. This is due to the fact that in 
the nonlinear algorithm functions like k a (S) and P C {S) (and their derivatives) have to be calculated 
much more often. 


SUMMARY 

We have presented two multigrid algorithms for the fully implicit simulation of incompressible, 
immiscible two-phase flow in a porous medium. The nonlinear multigrid algorithm is a standard 


587 




Figure 1: Water saturation for Test 1 (left) and Test 2 (right) after injection of 0.25 pore volume. 
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Figure 2: Oil production curves for the different test problems. 
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oa 
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0.16 

mi 
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oa 

0.12 

Era 

0.15 

0.14 
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0.10 

0.11 

0.14 

Ha 

0.10 

0.11 

0.14 
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Table 4: Convergence rates for different test cases. 


grid 

LINMLTG 

NLMLTG 

10 x 10 


1.53 

20 x 20 


1.89 

40 x 40 


2.09 

80 x 80 

0.53 

2.09 


Table 5: Typical execution times [msec] per time step per grid point. 
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FAS algorithm. The linear multigrid algorithm that is used to solve linear systems in Newton’s 
method employs a nonstandard construction for the coarse grid matrix. Both algorithms perform 
satisfactorily for a simple 2D test problem. The linear multigrid algorithm appears to be more 
efficient with respect to the execution time needed. 
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ABSTRACT 


Over the years, multigrid has been demonstrated as an efficient technique for solving inviscid flow prob- 
lems. However, for viscous flows, convergence rates often degrade. This is generally due to the required use of 
stretched meshes (i.e. the aspect-ratio AR = Ay/ Ax << 1) in order to capture the boundary layer near the 
body. Usual techniques for generating a sequence of grids that produce proper convergence rates on isotropic 
meshes are not adequate for stretched meshes. This work focuses on the solution of Laplace’s equation, 
discretized through a Galerkin finite-element formulation on unstructured stretched triangular meshes. A 
coarsening strategy is proposed and results are discussed. 


Introduction 

Multigrid method has been shown to be successful for solving elliptic problems. This is mainly due to its 
good damping properties which result from two very simple principles. A usual Fourier analysis demonstrates 
that most of the commonly used solvers effectively damp the high frequencies of a feigna.1. A low frequency 
component of a given signal on a fine mesh becomes a high frequency on a. coarser one, hence the idea of 
solving the same problem on a sequence of meshes where all frequencies can be damped equally and, if 
enough grids are available, only a few iterations will be required to produce a converged solution (for more 
details see [1]). Despite these rather simple considerations, the multigrid algorithm is complex and difficult 
to implement. One of the difficulties resides in the generation of the sequence of grids for unstructured 
meshes. The convergence properties of the multigrid method depend upon the “quality” of these grids. 

A sequence of meshes may be produced through two different methods. First, starting from a mesh that 
is not too fine but correctly represents the problem, finer meshes may be generated through refinement. 
A global refinement, performed through local subdivision of the triangles of the discretization, tends to 
preserve the geometrical features required to obtain an efficient multigrid method. However, this will clearly 
not be efficient in terms of computational cost, hence the local refinement technique where specific regions 
of the mesh are refined and then possibly adapted [2]. Although this method seems more reasonable, it 

‘This research was supported under the NASA contract No. NASI- 19480 while the authors were in residence at 
ICASE. 
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increases the computational time and the complexity of the multigrid algorithm. Another method consists 
in coarsening an existing fine mesh, which has been created to represent accurately the different phenomena 
to be observed. One of the techniques available consists in removing, through a coarsening criterion, a certain 
number of nodes from the initial mesh and to reconnect (retriangulate) the remaining set of nodes. This 
method is especially effective in the case of non stretched meshes [3]. The reconnection usually relies on 
the Delaunay technique [4] that tends to produce the “most equilateral” triangulation for the given point 
distribution and therefore is not easily applicable to stretched meshes. In order to avoid retriangulation, 
the so-called agglomeration technique (see Lallemand et al. [5]) is interesting. The generation of coarser 
meshes consists in the agglomeration, or fusion, of the control volumes of the discretization. However, for 
consistency considerations, when it comes to viscous flows, more accurate intergrid transfer operators are 
required [6, 7]. 

The following study focuses on the 2D Laplace’s equation A u(x,y) = 0, since the poor convergence 
properties of the multigrid technique, observed when solving the Navier-Stokes equations on stretched meshes, 
also appear for the solution of this simpler equation. The purpose of this work is to propose new coarsening 
strategies that will preserve the convergence rate of the usual isotropic multigrid technique. This is defined 
as a semi-coarsening method. This study will show how this process may be extended from the case of 
regular structured grids to totally unstructured meshes. 

The organization of the paper is as follows: the discretization of the 2D Laplace’s equation is introduced 
in Section 1 along with an edge-based data structure. Section 2 recalls the essential multigrid convergence 
properties. The generation of stretched grids is addressed in Section 3. A semi- coarsening algorithm, ex- 
tended to unstructured meshes, is presented in Section 4. Finally, numerous experiments are discussed in 
Section 5. 


1 Laplace’s equation 



Figure 1: Linear basis function tpi. 

The problem consists in solving Laplace’s equation: 

A u(x, y) = 0 on f2 convex polygonal domain. 
u = uq on r. 



( 1 ) 


A Galerkin Finite-Element formulation is used on unstructured triangular meshes. An integration by parts 
results in: 


/ A u (pi duj = — I Vu • Vpi du> + I Vu ■ ft <pi da 
Jn i J fi; JTi 


( 2 ) 
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where ipi is the linear basis function as depicted in Fig.l. If u is piecewise linear, then the Green formula 
and the notations of Fig. 2 result in: 


( v V»i)r, = 2 ^ f ’ k 3 

(V«)r, = {uiii k j + u k Hij - ujn ik ) 

ZA. i 


( 3 ) 


where u, is the value of the solution u on vertex i, A\ is the area of triangle T \ , the vector normal to the 
edge [i, j] and of magnitude equal to the length of the edge. Equation (2) can be rewritten as: 


A u ipi dw - (V ipi) Ti • (V u) Ti du = ■ (Vu) Ti 


( 4 ) 


Moreover, for the considered triangle T \ , (3) can be rewritten as: 


1 

( M x)7't = „ . (AMjjAi/j/; — AujfcAi/jj) 

2 j4, (5) 

( u y)i\ = 2^ (Atx,j A.rjj. — Auj k Axji) 

where Au, 7 = v.j — Uj. A similar formulation can be written for triangle In evaluating the coefficient for 
the edge joining vertices i and j, only the triangles T\ and Tj will yield non-zero contributions. The final 
expression of (4) is thus an edge-based formulation: 


L 


Au Lpj dio 


£ 

edges 


/ Ay ik Ay ]k Ay u Ayji \ 

V A x A 2 ) 


+ 


/ Ax ik Ax jk AxuAxji Y 

\ >1| A 2 ). 


Aujj 


( 6 ) 


where the sum is taken over all incoming edges for vertex i. The geometrical anisotropy is reflected in the 
coefficient associated with each edge. If the length ||ij|| increases (the nodes k and l being fixed) then the 
value of the coefficient decreases. Therefore, considering the domain S2 = (J , T), the maximum coefficient is 
associated with the smallest connecting edge and the minimum with the longest. 


2 Some definitions and convergence results 

Multigrid theory relies on the use of a sequence of nested meshes for solving (1). These meshes represent 
the different spaces where the equation is discretized. In what follows, only two meshes are considered: 
W-l, and 'H.jf with H — 2h and 'Hu C %h C Hf. The discrete problem on the fine grid is written as: 

A h u h = 0 (7) 

A weighted Jacobi relaxation is considered as the basic iterative process or smoother: 

u’,: H = s h u,\ = {I -UJ D h 1 Ah) <, where D h = {A h )u (8) 

In order to use both spaces for solving (7) it is necessary to use transfer operators. A linear interpolation 
P: Hn — > Hi,, defines the prolongation operator, and its transpose H — P*’ : 'Hi, — > 'Hu defines the 
restriction. The 2-Grid iterative operator M/, is then defined by: 

. M h ul = (J-r A,,' R Ah) S"' ul 
= (A h [ - PA n 'R) (A h S%) u)) 

with u\ — v pre-relaxations and v 2 — 0 post-relaxations. 
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One very important feature of a multigrid (MG) algorithm is its mesh-independent convergence. Accord- 
ing to Hackbush [8], mesh- independence for elliptic operators, is achieved through the smoothing property 
(11^/iS/i II < h 2 r](v), where lim,,-^ r/(u) — 0) and the approximation property (H-A^ 1 —PA~j^R\\ = 0(h 2 )). 
Because of its nature, the MG algorithm converges linearly with respect to the number of MG-cycles. 

Morano et al., in [3], showed that this may also be achieved for the Euler and low Reynolds number 
Navier-Stokes equations where the employed meshes are not stretched. However, when highly-stretched 
elements are used (mandatory for high Reynolds number solutions, see [7] for example), this convergence 
greatly deteriorates with classical fully-coarsened (FC) grids. It is no longer linear nor mesh-independent. 
The deterioration in convergence is also observed when the resolution of Laplace’s equation is attempted 
with highly stretched elements, that is, when the mesh is anisotropic. 

3 A sequence of grids 

When very stretched elements are used, the damping properties of the smoother are negligible in the stretch- 
ing direction. Thus, using a full- coarsening strategy will certainly not improve the damping properties, since 
the stretching is fully preserved on larger elements. Moreover, the distribution of nodes in the stretching 
direction will correctly represent the low frequencies of the signal, whereas, in the direction normal to the 
stretching, it will represent the high frequencies. Because of the nature of the smoothers commonly used, the 
multigrid technique damps mainly the high frequencies, hence the idea of semi-coarsening in the direction 
normal to the stretching. 


level 1 

/ \ 

level 2 

/ \ / \ 

level 3 

\ / \ / 

— — level 4 

\ / 

— — I level 5 


Figure 3: Sequence of grids for MSG. 

The semi-coarsening technique is well known and used especially in the structured mesh community. For 
complex geometries, however, multiple directions within the mesh require semi-coarsening. A process named 
Multiple Semicoarsened Grid (MSG) Algorithm was introduced by Mulder [9]. This technique relies on the 
generation of numerous grids that are semi- coarsened (SC) from the finer grid in all possible directions as 
depicted in Fig.3. This ensures proper dissipation of the signal. A multigrid scheme is then implemented 
using all the grids which is complex and costly, especially for 3D problems [10]. Moreover, there is no possible 
extension of this technique to unstructured grids. 

The complexity of the usual multigrid technique also relies on the full-coarsening method. This technique 
consists in removing every second vertex in each direction on a regular structured mesh, which results in 
a number of nodes of the coarse grid decreased by a factor 4. The V-cycle complexity of such a method 
tends to 4/3 WUs (a Work Unit corresponds to the computation of one residual on the fine grid). The 




594 




semi-coarsening technique produces coarse grids with a number of nodes decreasing by a factor 2 and the 
overall complexity tends to 2. Therefore, such a method will cost more per cycle. However, it will be shown 
that this technique allows a much better damping factor than a regular full-coarsening technique in the case 
of stretched meshes. 

The smoothing property is valid for the weighted Jacobi relaxation scheme applied in this study. The 
effect of the approximation property is emphasized since it determines the mesh-independence of the conver- 
gence. This property is verified when the discretized subspaces, defined by the sequence of coarser meshes, 
utilized within the MG algorithm are nested. In this paper, the sequence of meshes is created through 
a semi-coarsening technique followed by a retriangulation. When this strategy is applied to unstructured 
meshes, the nestedness of the meshes is rather difficult to preserve. The nodes of the coarse grid form a 
subset of the nodes of the fine grid which produces node-nested, but not element-nested, grids. 





c. Node-Nested Grid. 


b. Fully-Nested Grid. 




Figure 4: Coarse grid discretizations AR = 1. 


The example depicted in Fig.4 shows how the convergence varies with respect to the nestedness of the meshes. 
A non-stretched 89 node Cartesian mesh defines the fine grid (Fig.4.a). The boundary conditions are those 
defined in Section 5. Three different coarse grids are considered. Each of them is a node-nested grid and 
comprises 25 nodes. Fig.4.b shows a usual fully nested grid. Fig.4.c and d depict randomly coarsened grids. 
On the right side of the grid shown in Fig.4.c a few elements are not nested. Finally, Fig.4.d depicts a 
coarsened grid where the elements are anything but nested. Two-grid experiments (see Section 5.1) are 
performed and Fig.4.e depicts the respective convergence histories. The convergence rate ranges from 0.15 
to 0.31 for such a simple test-case. Therefore, the nestedness of the grids is of extreme importance in the 
quality of the MG performance. Further results may be obtained in [11]. 


4 Semi-coarsening and unstructured meshes 

In what follows is presented a semi-coarsening technique that is applicable to unstructured meshes as well 
as to structured meshes. The technique may be seen as a variant of the Algebraic Multigrid (see [12]) in the 
sense that it necessitates a pre-processing stage that relies on the discretization of the equation for generating 
the coarse grids. As mentioned previously, the Galerkin discretization of Laplace’s equation amounts to a 
sum over edges. The value of the coefficient associated with each edge is determined by the geometry of the 
surrounding elements (triangles). The smaller the length of the edge, the larger the value of the coefficient. 
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The semi-coarsening technique proceeds as follows: once a node is selected to remain on the coarse grid, 
its neighbors must be scanned to determine which one of them has to be removed. The removed node 
corresponds to the edge associated with the largest coefficient. The algorithm is two-fold. First, it has to go 
through the mesh and select the nodes to remain on the coarse grid, and, second, for each selected node, it 
has to determine which of its neighbors is to be removed. The setup employed for coarsening is the same as 
that, used for agglomeration in [13, 7]. 

Unstructured meshes for high-Reynolds number flow computations are essentially comprised of two re- 
gions: one where the aspect-ratio is (very) small, where the viscous effects are dominant, and another one, 
where the aspect-ratio is close to 1, far from the viscous effects (the farfield for example). In order to pre- 
serve the low complexity of an MG algorithm it might be desirable to perform the semi-coarsening only in 
the low aspect-ratio region, whereas a full-coarsening may be applied elsewhere. Again, this is similar to 
an Algebraic Multigrid as described in [12]. This should provide a slightly better complexity than the one 
obtained through semi-coarsening only. The algorithm is written as: 


1. For each node i on the fine grid the average and maximum values of the coefficients coe/; of its 
connecting edges are computed: avg, and max,. 


2. The parameter ft 


1 


N “ avgi 


E 


provides an indication of the anisotropy. 


3. The determination, through a heaplist, of the vertex jpick that remains on the coarse grid is then 
performed. 


4. The removal of the connecting neighbor(s) of jpick is achieved through a coarsening criterion. 

5. Goto [3], 


The heaplist serves as an advancing front. The starting point of the front will determine the quality of the 
subset of nodes which constitute the coarse grid. Since semi-coarsening consists in removing every second 
vertex in the direction normal to the stretching, it. is expected that t.he advancing front, should be initiated 
from the region comprising the lowest, aspect-ratio elements (the surface of an airfoil for example). Therefore, 
t.he following items are incorporated: 

• Technical programming considerations make the front start first, with the boundaries. 

• The body and farfield extrema are retained on the coarse grid in order to preserve the general geometry 
of the discretized domain. 

• The heaplist is determined by a “key-function” [14]. This “key-function” is defined by t.he connecting 
distance (minimum number of edges) to the boundary (or region where t.he aspect-ratio is minimum) of 
t.he unprocessed vertex (not. in the front). The result is a list of edges where the first edge is associated 
with the minimum distance and jpick is its unprocessed vertex. 

Once a node is selected to remain on the coarse grid, a semi-coarsening criterion determines which of the 
vh r ,i nr connecting neighbors of jpick is to be removed: 

1. nb max is defined by the maximum number of nodes to be deleted: 

if max jpick > P avg jpick then nb max — 1 (Semi-Coarsening), 

else nbmax = nb ra \ ge (Full-Coarsening). 

2. The array List.jpj r k contains the available unprocessed neighbors. 
n r iciy the number of deleted nodes, is set equal to 0. 
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3. The determination of the available local maximum coefficient is performed: /oc„, ox = max ( coefi ). 

id List j pick 

4. A node i € Listj p i c k is removed if: coefi = loCm ax and loc, nax > avgj p i c k. That is if its value is 
equal to the maximum local coefficient and if this maximum is greater than the average value of all 
the surrounding coefficients. 

5. The array Listj p j c k is updated along with the number of deleted nodes (ridel n<iel + 1). 

If n de i < nb max goto [3]. ' 

This algorithm clearly provides a semi/full-coarsening (S/FC) technique. Yet, if appropriate, the algo- 
rithm only performs semi-coarsening or full-coarsening. Such an algorithm may be applied to unstructured 
meshes as well as to structured meshes provided the considered discretization relies on an edge-based data 
structure. This algorithm relies on the discretization of the equation to be solved rather than on simple 
geometrical considerations. 

a. Delaunay - Max Min. b. Min Max - Variant. 

Figure 5: Retriangulation techniques. 



Once the subset of nodes of the fine grid is obtained after coarsening, it needs to be retriangulated. The 
reconnection relies here on a Delaunay method. This method has proved useful and efficient when used in 
conjunction with equilateral triangle types of meshes. The coarsening technique utilizing such an algorithm 
was introduced in [15]. Unfortunately, this method does not apply to highly stretched meshes. It usually 
results in a poor reconnection in the region where the nodes of the mesh are not regularly distributed. In 
order to overcome this difficulty, an edge-swapping technique may be employed [16, 17]. The Delaunay 
reconnection of a set of four nodes results in two triangles where the minimum angle is maximized (Fig.5.a). 
In lieu of preserving this connectivity it is possible to swap the edges by minimizing the maximum angle 
of the two triangles (Fig.5.b). This technique has proved very efficient when used with an advancing front 
technique for generating meshes, and is thus employed for the unstructured test-case in this paper. The 
reconnection of the structured coarse grids is performed through the usual Delaunay method. 


5 Results and comments 

In order to validate the previous concept, various test-cases are performed for solving the Laplace’s equation. 
Results are presented on structured and unstructured meshes. The discretization domain for the structured 
cases is defined by a square of surface 1, while the unstructured case is defined by a pentagon plunged in an 
unstructured mesh. A non-stretched structured test-case serves as the standard test-case since it provides 
the best MG convergence. The relaxation parameter oj is equal to 0.85 and no optimization is performed 
here. Two sweeps are performed on the fine grid. The transfer operators are linear and were introduced 
in [18]. All cases are performed with Dirichlet boundary conditions. For the structured test-cases they are 
defined by u(0, x) = 1, u(x, 1) = 2, w(l, .t) = 3 and u(a;,0) = 4, and for the unstructured case they are equal 
to —1 on the body and to 1 on the farfield. For all test-cases, the different grids used are presented along 
with the convergence histories of the various schemes. The convergence histories depict the logarithm of the 
norm of the normalized residual with respect to the number of cycles. This convergence is carried over until 
a residual decrease on the fine grid equal to 10~ 10 . 
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5.1 Two-Grid experiments 

These experiments require a residual decrease on the coarse grid equal to 10” 10 . The semi-coarsening-only 
(nb ma x = 1) option of the algorithm is used for the generation of the coarse grids. 


Non-stretched Meshes. The aspect-ratio is equal to one and the grids are fully-nested. The fine and 
coarse grid, respectively, are similar to those depicted in Fig.4.a and b with 4225 (65 x 65) and 1089 (33 x 33) 
nodes, respectively . The coarse-grid is a manually (M) fully-coarsened grid (i.e. the coarsening algorithm is 
not involved). No anisotropy is encountered here and a solution is obtained after 12 cycles which corresponds 
to a convergence rate of 0.15. 



a. 4257 Node Fine Grid. 



d. 2145 Node SC Grid (C). 


b. 1 105 Node FC Grid (M). c. 2145 Node SC Grid (M). 



e. Resulting Convergence Histories. 
Figure 6: Linear Meshes - AR = 1/4. 


Linear Meshes. A 4257 (33 x 129) node fine grid is built (Fig.6.a) where the distribution of nodes is 
linear in the vertical (normal to the stretching) direction and the aspect-ratio is equal to 1/4. Three types of 
coarser meshes are presented. In Fig.6.b is depicted a manually fully-coarsened 1105 (17 x 65) node coarse 
grid, that represents the classical coarsening technique. In Fig.6.c and d are depicted two semi-coarsened 
grids. The first grid is obtained manually through a vertical semi-coarsening in a 2145 (33 x 65) node 
coarse grid. The second grid is the result of the coarsening algorithm (C) applied to the fine grid. It is a 
2145 node coarse grid. The triangulations of the two semi-coarsened grids appear to be different while the 
subset of nodes are the same. Yet, similar convergences are expected. In Fig.6.e are depicted the various 
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convergence histories. The full-coarsening technique results in a convergence rate of 0.77 while the semi- 
coarsening techniques provide both a convergence rate equal to 0.15, which is identical to the convergence 
rate of the non-stretched test-case. 



a. 4257 Node Fine Grid. b. 1 105 Node FC Grid (M). c. 2145 Node SC Grid (M). 




e. Resulting Convergence Histories. 

Figure 7: Exponential Meshes - AR = 2.4 x 10 -4 . 

Exponential Meshes. A 4257 (33 x 129) node fine grid is depicted in Fig.7.a. The distribution of 
nodes is exponential in the vertical direction. The minimum aspect-ratio is equal to 2.4 x 10 4 and the 
maximum to 2.2. This grid is manually fully-coarsened which produces a 1105 (17 x 65) node coarse grid 
(Fig.7.b). A manually vertically semi-coarsened 2145 (33 x 65) node coarse grid is depicted in Fig.7.c. 
Where the stretching follows the horizontal direction (where the distribution of nodes is more dense) this 
technique will provide the expected result, while the stretching deteriorates in the vertical direction (where 
the distribution of nodes is less dense). A 2141 node coarse grid obtained with the coarsening algorithm is 
depicted in Fig.7.d. In this case the coarsening follows the direction normal to the stretching everywhere in 
the mesh, as can be seen in the less dense region. The full-coarsening technique results in a 0.80 convergence 
rate (Fig.7.e). The manually semi-coarsened grid proves to have a much better convergence rate of 0.28, 
but the best convergence rate of 0.20 corresponds to the automatically semi-coarsened grid. Moreover, the 
vertically semi-coarsened grid shows a change of slope at the end of the convergence. This means that the 
MG algorithm does not perform optimally and does not damp low frequencies correctly, whereas the code 
semi-coaxsened grid provides a linear-type of convergence rate. Therefore, and although both semi-coarsened 
grids have similar numbers of nodes, the coarse grid obtained through the automated coarsening algorithm 
results in more optimal convergence. 
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a. 4225 Node Fine Grid. b. 1089 Node FC Grid (M). c. 2145 Node SC Grid (M). 




Figure 8: Chebyshev Meshes - AR — 0.024. 

Chebyshev Meshes. A 4225 (65 x 65) node fine grid is built where the distribution of nodes is a cosine 
function in both directions. The minimum aspect-ratio is equal to 0.024 and the maximum to 40.73 (Fig.8.a). 
This grid comprises stretched and non-stretched elements. The minimum aspect-ratio cells are essentially 
located on the boundary of the domain, while the maximum aspect-ratio cells are located in the bisectors 
and in the middle of the domain. A manually fully coarsened 1089 (33 x 33) node grid is depicted in Fig.8.b. 
Although no natural manual semi-coarsening technique applies here, a horizontally semi-coarsened 2145 
node (33 x 65) coarse grid is built for comparison purposes (Fig.8.c). The coarsening algorithm resulted in 
a 2115 node coarse grid (Fig.8.d). It is again obvious that the semi-coarsening follows the direction normal 
to the stretching, each region being clearly separated by the bisectors. The fully-coarsened grid provided a 
convergence rate of 0.50, and 0.30 was achieved with the manually horizontally semi-coarsened grid (Fig.8.e). 
A linear type of convergence resulting in a convergence rate of 0.12 was achieved with the code semi-coarsened 
grid. It is interesting to note that, despite the similar number of nodes shared by the manually horizontally 
semi-coarsened grid and the code semi-coarsened grid, they provided different results, and therefore the good 
convergence rate of the code semi-coarsening technique cannot be attributed solely to the number of nodes 
on the coarse grid. 


600 



5.2 Multigrid experiments 

In this section, multigrid experiments are explored in order to demonstrate the robustness of the algorithm in 
producing a sequence of grids that permit efficient MG convergence. The number of grids will vary according 
to the test-case. Two sweeps of the Jacobi relaxation are performed on each level and W-cycles are employed 
since they provide a better resolution of the coarse grid, resulting in better convergence rates. A structured 
Chebyshev and an unstructured test-case are performed with both semi and semi/full-coarsening techniques. 



a. 1 6641 Node Fine Grid. b. 8324 Node SC Grid. c. 6294 Node S/FC Grid. 



d. SC Region. 



e. FC Region. 



f. Resulting Convergence Histories. 
Figure 9: Multigrid Chebyshev Meshes - AR = 0.012. 


The Chebyshev test-case. A 16641 (129 X 129) node fine grid is constructed with a minimum aspect- 
ratio value of 0.012 and a maximum value of 81.50 (Fig.9.a). The semi-coarsening option provides a sequence 
of 7 grids comprising 16641, 8324 (shown Fig.9.b), 4329, 2289, 1211, 652 and 352 nodes, and the semi/full- 
coarsening technique a sequence of 6 grids comprising 16641, 6294 (shown Fig.9.c), 2976, 1077, 559 and 286 
nodes. The respective W-cycle complexities are equal to 11 and 6 WUs. The region where the algorithm 
performs the semi-coarsening is depicted nodewise in Fig.9.d, while Fig.9.e shows where the full- coarsening is 
applied. It is clear that the semi-coarsening is applied to the highly stretched element region as expected. The 
semi- coarsening technique results in a standard-like convergence rate of 0.15 (Fig.9.f). When used only with 
6 grids, this technique requires the coarsest grid to be converged completely, otherwise the process abruptly 
stalls at some low residual value. A convergence rate of 0.17 and a low complexity favor the semi/full- 
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coarsening technique. Yet, the convergence history displays a (slight) change of slope. This indicates that 
the method is sensitive to the quality of the triangulation of the coarse grids. Mesh-independent convergence 
is the purpose of this study, and is only truly achieved with the semi-coarsening technique. The slightly 
poorer type of convergence associated with the semi/full-coarsening technique may be explained by the 
quality of the triangulation of the coarse grid. Full-coarsening in non-stretched regions tends to deteriorate 
the relative difference of aspect-ratio between the highly and non-stretched regions. Moreover, the addition 
of a 7th grid, or even converging the coarsest level, does not change the convergence. 







Figure 10: Multigrid Unstructured - Full- Coarsening - AR = 3.7 x 10 5 . 


The unstructured test-case. In this case (Fig.lO.a), a grid-spacing Ay = 10~ 6 on the body results in 
an average minimum aspect-ratio of 3.7 X 10~ 5 . In Fig.lO.e and f are depicted the zoom of the right upper 
corner and of the wake region respectively in order to show the different type of stretched and non-stretched 
elements that appear in these meshes. A first sequence of 4 fully- coarsened meshes is manually constructed. 
The number of nodes for each level are: 19366, 4955, 1270 and 335. These meshes are depicted in Fig.lO.a 
to Fig. 10. d. The complexity of a W-cycle is equal to 3.2 WUs. 
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a. 9983 Node SC Grid. b. 5189 Node SC Grid. c. 2724 Node SC Grid. d. 1717 Node SC Grid. 



9 10 9 10 


e. 1044 Node SC Grid. f. 589 Node SC Grid. g. Retriangulated Fine Grid. h. Original Fine Grid. 

Figure 11: Multigrid Unstructured - Semi-Coarsening - AR = 3.7 x 10 

The second sequence is obtained with the semi-coarsening technique only. There axe 7 meshes that have 
19366, 9983, 5189, 2724, 1717, 1044 and 589 nodes (Fig. 11. a to Fig.ll.f). The W-cycle complexity is equal 
to 12.5 WUs. The last sequence of meshes results from the semi/full-coarsening technique and provides 7 
meshes (Fig.l2.a to Fig.l2.e): they comprise 19366, 9594, 4708, 2325, 1391, 794 and 424 nodes, resulting 
in a 11 WU W-cycle complexity. SC and S/FC methods required all coarse point sets to be retriangulated 
using the Min-Max Delaunay variant. In order to maintain favorable convergence rates, it was found that 
the fine grid needed to be retriangulated according to the same technique. This can partially be explained 
by the quality of the nestedness of all the grids as seen in Section 3. The fine grid is not depicted here for 
these last two sequences because it would appear similar to the original (Fig. 10. a). However, the difference 
between the original and retriangulated fine grids, mostly confined to wake regions, is illustrated in Fig.ll.g 
and h. 
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a. 9594 Node S/FC Grid. 


b. 4708 Node S/FC Grid. 


c. 2325 Node S/FC Grid. 


d. 1391 Node S/FC Grid. 



Figure 12: Multigrid Unstructured - Semi/Full-Coarsening - AR — 3.7 x 10 -5 . 

Converging the coarsest grid of the sequence of the fully-coarsened grids does not change the convergence 
rates equal to 0.80 (Fig.l2.f). This indicates that the use of an additional coarser grid would not change the 
convergence. Besides, the retriangulation of the entire sequence of the fully-coarsened grids does not change 
the convergence rate of the MG algorithm, whether or not the coarsest grid is converged. The semi/fully- 
coarsened and semi-coarsened grids provide a clear improvement with respect to the usual fully-coarsened 
grids with convergence rates equal to 0.23. The semi/fully-coarsened grids demonstrate a better behavior 
than in the Chebyshev case because they are very similar to the semi-coarsened grids. Indeed, since most 
of the nodes are concentrated in the highly stretched regions, the algorithm performs essentially as a semi- 
coarsening technique. This type of meshes is more similar to exponential-type meshes rather than Chebyshev 
meshes. 
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Figure 13: Significant Results. 
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Concluding remarks 

In Fig.13 are gathered the most significant results. They are separated in two different subsets. Curves 1 
and 2 represent the spectrum of convergences within which the other convergence histories must fit. Indeed, 
curve 1 shows the best convergence and curve 2 shows what is expected when the discretization subspaces 
are only node-nested. All other curves depict the convergence histories of the various test-cases that employ 
the semi-coarsening algorithm. The problem to be solved is the same for all test-cases, only the geometries of 
the discretized spaces differ. The results are straight lines with similar slopes that fall within the predicted 
range. The difference of slopes may be explained by two essential reasons. First, the boundary conditions 
of the structured and unstructured test-cases differ. It is not possible, due to the geometry, to transpose 
exactly the same boundary for both types. Then, it has been shown that the nestedness of the subspaces 
influences the quality of the convergence. It cannot be expected that the unstructured grids be completely 
nested. On the other hand the quality of the triangulation per grid may also damage the convergence. 

In this paper, a new semi-coarsening algorithm relying on the discretization of the equation, which should 
enable flexible applications, has been introduced. Convergence rates for highly stretched unstructured meshes 
have been obtained similar to those for standard Cartesian structured non stretched meshes. Finally, linear, 
hence mesh independent, convergence rates have been demonstrated. The extension of these unstructured 
semi-coarsening techniques to the resolution of the Navier-Stokes equations is planned in the near future. 
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PRECONDITIONING OPERATORS ON 
UNSTRUCTURED GRIDS 


S.V. Nepomnyaschikh* 
March 14, 1996 


Abstract 

We consider systems of mesh equations that approximate elliptic 
boundary value problems on arbitrary (unstructured) quasi-uniform 
triangulations and propose a method for constructing optimal precon- 
ditioning operators. The method is based upon two approaches: (1) 
the fictitious space method, i.e., the reduction of the original problem 
to a problem in an auxiliary (fictitious) space, and (2) the multilevel 
decomposition method, i.e., the construction of preconditioners by de- 
composing functions on hierarchical meshes. The convergence rate of 
the corresponding iterative process with the preconditioner obtained 
is independent of the mesh step. The preconditioner has an optimal 
computational cost: the number of arithmetic operations required for 
its implementation is proportional to the number of unknowns in the 
problem. The construction of the preconditioning operators for three 
dimensional problems can be done in the same way. 


1 INTRODUCTION 

Let Cl C IR 2 be a domain with a piecewise smooth boundary T which belongs 
to the class C 2 and satisfies the Lipschitz condition [18]. In the domain Cl 

‘Computing Center, Siberian Branch Russian Academy of Sciences, 6 Lavrentiev av., 
Novosibirsk, 630090, Russia. The work was partially supported by the ISF under contract 
NPB 000, the grant DRET 93/34/401, the Russian Basic Research Foundation grant 
93-01-01783 
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we consider the boundary value problem 


- 2 TUT ai ^ x ) + °o(x)« = f{x ) , 

i,j = i ux i ax j 




u 


(x) = o , x e r 0 


(i) 


du 

— + a(x)u = 0, seTi 


where 

du ^ . du 

9N = £ 

is the conormal derivative, n denotes the outward normal to F, and To is a 
union of a finite number of curvilinear segments, F = To U Tj, To = To- Here 
f 0 denotes the closure of r 0 . 

By r 0 ) we denote the subspace of the Sobolev space 

H\D, Tq) = {ve H'iD) | v(x) = 0, x G r 0 } . 


cos(n, X { ) 


We introduce a bilinear form a(u,v) and a linear functional l(v) as follows: 

du dv 

j a ij{ x ) 

«>>= ' 


a(u ’ v) = X (,£ ^ + dx + X, cr(x)uu dx 


/(u) = / f(x)v da: . 

Jn 

Let us suppose that the operator coefficients and the right-hand side of prob- 
lem (1.1) are such that the bilinear form a(u,v) is symmetric, elliptic and 
continuous on r 0 ) x To), i.e., 

a(u, v ) = a(v,u) Vu, v G H x (fi, To) 

a o\\u\\Hi {Q) < a(u,u) < ai||u||^ 1(n) Vu G B 1 (Q,r 0 ) 

and the linear functional l(v) is continuous on To): 

|/HI<a||u|U 1(n) WueH\n,To). 
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The generalized solution u € To) of problem (1.1) is, by definition, 

a solution to the projection problem [2] 

u G ^(OjTo): a(u,v) = l(v) Vn G iP 1 ^, To) . (2) 

It is familiar that under these assumptions concerning a(u,v) and l(v) there 
exists a unique solution of problem (1.2). 

Let a positive parameter h be fixed (we always suppose that h is suffi- 
ciently small). Let 

M 

& = U 

2=1 

be a triangulation of the domain D (ft h is assumed to be a closed set). We 
suppose that D h is a quasi-uniform triangulation [5], i.e., there exist positive 
constants l\, l 2 and s which are independent of h and such that 

hh<ri<l 2 h, — <s, i = 

Pi 

where r,- and pi are radii of circumscribed and inscribed circles for the tri- 
angle t,-, respectively. We also assume that the triangulation boundary 
approximates V with an error 0(h 2 ). If Ti = T, we suppose that fi C 0 h ] if 
r 0 = T, we suppose that D h C ft. If r 0 ^ 0 and Ti ^ 0, we make the follow- 
ing assumption: points where the bou ndary cond ition changes should be at 
triangulation nodes, I\ C Cl h and r 0 C (IR 2 \ D h ). Part of approximating 
To will be denoted by Pq, and that for Ti by Tj. For the triangulation Sl h , 
we define the space Hh(D h ) of real continuous functions which are linear on 
each triangle of f l h and vanish at Pq . We extend these functions on 0 \ Vt h 
by zero. 

The solution of the projection problem 

u h eH h (tl h ): a(u\v h ) = l(v h ) \/v h G H h (Sl h ) (3) 

will be called an approximate solution of problem (1.2). Aspects of approxi- 
mation of (1.2) by (1.3) have been thoroughly studied (see [5, 14]); we do not 
consider them here. Each function u h € Hh(D h ) is put in standard corre- 
spondence with a real column vector u £ IR^ whose components are values 
of the function u h at the corresponding nodes of the triangulation f l h . Then 
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(1.3) is equivalent to the system of mesh equations 

Au = f 

( Au , v) = a(u h , v h ) Vu\ v h G H h {D h ) (4) 

(. f,v) = l{v h ) Vv h e H h (tl h ) 

where u h and v h are the respective prolongations of vectors u and v, (/, v) is 
the Euclidean scalar product in IR^. 

The main goal of this work is to construct a symmetric positive definite 
preconditioning operator B for problem (1.4) so as to satisfy the inequalities 

ci(Bu,u) < (Au,u) < C2(Bu,u) Vw € IR* (5) 

where positive constants c\ and C 2 are independent of h\ the multiplication 
of a vector by B~ x should be easy to implement. 

The preconditioner B is constructed by using the method of fictitious 
space [10] in two stages. At the first stage, we pass from an arbitrary un- 
structured triangulation D h to an auxiliary structured non-hierarchical mesh, 
and at the second stage to a hierarchical mesh (a square mesh on a square 
containing the original domain 0). Note that the passage from an arbitrary 
triangulation to a structured mesh was earlier used in [11]. This paper in- 
cludes some development of [13] for the case of locally refined grids. Another 
technique for constructing the preconditioners on unstructured meshes was 
proposed in [8,9,10,17]. The construction of preconditioning operators on 
non-hierarchical grids was considered in [6]. 


2 REDUCTION TO A STRUCTURED 
MESH 

The preconditioning operator B in (1.5) is constructed on the basis of the 
lemma of fictitious space [11]. For convenience, we give this lemma here. 
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Lemma 2.1. Let Ho and H be Hilbert spaces with the scalar products 
(u Q , v 0 )h 0 an d ( u i v )h, respectively. Let Ao and A be symmetric positive def- 
inite continuous operators in the spaces Ho and H : 

A 0 : Ho ^ Ho, A: H -► H . 

Suppose that R is a linear operator such that 

R: H-+H 0 

(A 0 Rv,Rv) Ho < c r (Av,v) h Wv e H 

and there exists an operator T such that 

T: Ho —■ ► H , RT u 0 = uo 

c t (ATu 0 , Tu 0 )h < (A 0 u 0 ,uo)h 0 Vti 0 G ffo 
where cr and ct are positive constants. Then 

c t (Aq X uo,Uo)h 0 < (iL4 -1 i2*u 0 ,u 0 )flb < Ci ? (Ao 1 u 0 ,«o)if 0 Vu 0 € H 0 ■ 

The operator R* is adjoint to R with respect to the scalar products (uq, Vo )r 0 
and (u, v)r: 

R*: H-*H 0 

(R*u 0 , v)h = (« o , Rv)h 0 • 


Note that for constructing and implementing the preconditioner, i.e., the 
operator RA _1 R*, we only require the existence of the operator T. In our 
case, the role of the operator A 0 is played by A of (1.4), and the role of the 
space H 0 by Hh{Llh)- In order to use Lemma 2.1, we construct a fictitious 
(auxiliary) space and the corresponding operators. To do this, we embed 
the domain Cl in a square II. Let Ki denote the union of triangles in the 
triangulation Cl h which have a common vertex z,-, and let d; be the maximum 
radius of circle inscribed in Ki. In the square II, we introduce an auxiliary 
grid lift with a step size h such that 

h < — “7= min d ; . (6) 

2V2 * W 
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Let us assume that h = l • 2 J , where l is the length of sides of II and J 
is a positive integer. We denote the nodes of the grid IU by , 

j/j) j L J 0, 1, . . . , L 

and the cells of lU by Dij , 

Dij = {( x,y ) | Xi < x < x i+ i, yj <y < y j+ 1 } 

H h = U . 

i,j=0 

Let Q h denote the minimum figure that consists of cells D^ and contains 
D h \ Q, h C Q h ] let S h be the set of boundary nodes of Q h . We subdivide the 
set S h into two subsets Sq and as follows: if 

Dij nr o /0 

all nodes of Dij f! S h are in Sq 

S l = S h \S%. 

Using cell diagonals, we triangulate Q h and iU; hereafter, the designations 
Q h and II^ refer to triangulations as well. Let Hh(Q h ) be the space of real 
continuous functions which are linear on the triangles of Q h and vanish at 
the nodes of Sq. It is the space Hh{Q h ) that will be used as the fictitious 
space in Lemma 2.1. 

We now define the projection operator R 

R: MQ h ) - MD h ) 

the extension operator T 

T: H h (D h ) - H h (Q h ) 

and an easily invertible operator in the space Hh(D h ). 

Let us begin with the operator R. For a given mesh function 

U h (Zij) e H h (D h ) 
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we define a function u h G H h (D h ) as follows. Let z\ be a vertex in the 
triangulation D h ] assume that zi € Ay- We put 

= (TU h )(zi) = U\Z tj ) . (7) 

The function u h is equal to zero at nodes £ Tj. 

Then, let us define the operator T. For a given function u h € H h (D h ), 
we define a function U G H h (Q h ). The function U h is equal to zero at nodes 
Zij G Sq. At the other nodes, U is defined as follows. If a cell Aj contains 
a certain vertex z\ of the triangulation D h , we put 

UHZij) = (Tu h )(Zij) = u\z,) . 

For each of the remaining nodes G Q h , we find the closest vertex zi of 
the triangulation D h (if there are several closest vertices, we can choose any 
of them) and put 

U\Zii) = (Tu h )(Zi,) = u\z,) . 

Finally, in the space Hh{Q h ) we define the operator Aq\ 

(AqU,V)=[ ( (vU h ,vV h ) + U h -V h )dxdy VU h ,V h eH h (Q h ). (8) 

JQh 

where U h and V h are the respective prolongations of the vectors U and V. 

Theorem 2.1. There exist positive constants C 3 and C 4 , independent of 
h, such that 

c 3 (A~ 1 u, u ) < (RAq'R^u, u ) < c 4 (A _ 1 u, u) Vu G IR^ . 

Here A, R and Aq are operators o/(1.4), (2.2) and (2.3), respectively; R* 
is the transpose of R (we hereafter use the same designation for an operator 
and its matrix representation). 

Proof. The theorem easily follows from Lemma 2.1, condition (2.1) and 
the familiar equivalence of A 1 -norms of finite-element functions in the spaces 
Hh(O h ), Hh{Q h ) and the difference counterparts of these norms [14]. 
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Remark 2.1. The implementation of the operator R is equivalent to the 
piecewise constant interpolation. It is easily seen that the number of arith- 
metic operations required for multiplying R or R* by a vector is proportional 
to the number of nodes in the mesh domain. 

Thus, the construction of a preconditioning operator on an unstructured 
triangulation is reduced to the construction of a preconditioning operator for 
Aq. The latter problem is considered in Section 3. 


3 FICTITIOUS SPACE AND MULTI- 
LEVEL DECOMPOSITION METHODS 

In order to find a preconditioning operator for Aq, we again use Lemma 2.1. 
Here the fictitious (auxiliary) space is Hh(R h ) which consists of piecewise 
linear continuous functions vanishing on the boundary <9n of the square n. 
Efficient preconditioning operators in Hh(R h ) are well known; in particular, 
we may use the BPX preconditioner [4]. To do so, we use the following 
construction. 

We divide the domain n \ into two non-intersecting subdomains such 
that 

n\ = To U Ti , Go n Gi = 0 

( 9 ) 

dGo ndn = T 0 , <9Gi n dn = f ! . 

According to (3.1), we represent the triangulation n^ 1 \ Q h as a union of two 
non-overlapping parts: 

U h \Q h = G h 0 U G\ 

where Gq and Gj are mesh approximations of the domains Go and G\, re- 
spectively. Further, we denote 

G = n U Ti U G x , G h = Q h U Gi 

Hh(G h ) finite-element space of functions vanishing on dG h . We consider in 
the sequence of grids 

n;,n‘,...,n; = n* 
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with step sizes 

ho = l , hi = l ■ 2 1 hj = h = l ■ 2~ J . 

We triangulate these grids and consider the corresponding fini te-element, 
spaces 

w$ c Wi h c . . . c w) = # /l (n /l ) . 

By we denote the nodal basis of the space Wj 1 , l = 0, 1 

First, let us examine the case of Id = F; accordingly, here = S h . By 
we denote the restriction of the basic function onto Q h . We put each 
function U h G Hh(Q h ) in correspondence with a function U h G Hh(Il h ): 

f u h (Zij) , Zn € g* 

^(^•) = 

l o, ^en A \g A . 

Define 

CN l U k = 't £ (C/' l ,<l>S' ) ) i . a (n) W k €H k (Q h ). 

supp$TnQ h ^0 

Theorem 3.1. There exist positive constants c 5 and c 6> independent of 
h, such that 

c 5 (A~ 1 u,u) < ( RCfj l R*u,u ) < c & (A~ 1 u,u) \/u G IR^ . 

Proof. Let us define 

R N : H h ( H h ) - H h (Q h ) 

to be an operator of restriction on Q h : 

(R N U h )(Z ij ) = U h (Z i3 ) VZijZQb. 

If we subdivide the nodes of II' 1 into two groups: (1) the nodes of Q h (includ- 
ing those of S h ), and (2) the remaining nodes, then we obtain the following 
matrix representation for Rn (see also [1]): 

Rn = (10) 
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where I is the identity matrix corresponding to nodes of group ( 1 ), and 0 is 
the zero matrix corresponding to nodes of group (2). It is evident that 

l|flwC‘l|jr<(M < ||C ft || H i ( nh) w‘ e H h (U h ) . 

By the theorem of extension of mesh functions [ 6 ], there exists the extension 
operator 

T n : H h (Q h ) -> H h (U h ) 

uniformly bounded with respect to h. 

According to Lemma 2.1 and [4], there exist positive constants C 7 and c 8 , 
independent of h, such that 

c 7 {Aq X U, U) < (RnC^R^U, U) < c & {A- q l u,u) W 

where Aq is the operator of (2.3) and the definition of C ff 1 is 

J N, 

Cn'U h = W‘ € H h ( IC) . 

1=0 i = 1 

Taking into account the explicit form of Rjv, we complete the proof of The- 
orem 3.1. 

Then, let us examine the case of the Dirichlet problem, i.e., To = T and, 
accordingly, S£ = S h . We define the preconditioner as follows: 

ci L u h = 't Y. vc 1 g u k (Q h ). 

l =° supp^°C Q h 


Theorem 3.2. There exist positive constants eg and cw, independent of 
h, such that 

Cq(A~ 1 u,u) < (RCq 1 R*u,u) < cio(A _1 u,u) Vu G IR^ . 


Proof. In this case, the equivalence of the operators Aq and Cq easily 
follows from the multilevel technique [3,4,15,16] and can be done, for in- 
stance, by using quasi-interpolants from [12]. Then, from Theorem 2.1 we 
get the assertion of Theorem 3.2. 
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Finally, we examine the case of mixed boundary conditions, i.e., T 0 7 ^ 0 
and Ti 7^ 0 . We denote 

C$U h = VU h eH h (Q h ). 

1=0 supp#C G h , 
supp n Q h 7^ 0 


Theorem 3.3. There exist positive constants c\\ and c\ 2 , independent of 
h, such that 

cii(A - 1 u,u) < {RCfl R*u,u) < ci 2 (A - 1 u,u) Vu 6 IR^ . 


Proof. The theorem is proved by using the argument of Theorem 3.2 
and then that of Theorem 3.1. Indeed, at the first step, let us ‘extend’ the 
Dirichlet boundary condition from Sq to the boundary of the triangulation 
II\ To do it, we consider finite element space Hh(G h ) and define 

Ce l U h = 't. E VC/^ € Hh{G k ). 

1=0 supp^ 0 C G h 

Then, according to Theorem 3.2, there exist positive constants ci 3 ,ci 4 , inde- 
pendent of h , such that 

C3||C k ||? r ,, 0 . ) < (C G U,U) < ci 4 ||C ,l ||^i(Qh ) VU k e H h (G h ). 

At the second step, define 

R N ,g : H h (G h ) -> H h (Q h ) 

as a restriction on Q h from G h : 

(R N , G U h )(Z ij ) = U h (Z ij ) VZ i:j g Q h ■ 


Then, from Lemma 2.1 we get 

c 15 (Aq 1 C/, U) < (Rn'gCg'Rn'gU, U) < aeiAjU, U) VU h <E H h (Q h ) 
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where c 15 ,ci 6 are independent of h. Using again the explicit form of Rn,g , 
we complete the proof of Theorem 3.3. 


4 LOCALLY REFINED GRIDS 

In this section we consider a triangulation Q h of the domain Q 

M 

D h = 0 Ti 
i = 1 

and assume D h is regular but not quasi-uniform, i.e., there exists a constant 
s, independent of h, such that 

— < s , * = 1, . . . , M 

Pi 

where r f - and p,- are radii of circumscribed and inscribed circles for the triangle 
Ti , respectively. It means that D h can be locally refined. For this triangulation 
fU, we define the space Hh(D h ) of real continuous functions which are linear 
on each triangle r, of f l h . For the sake of simplicity, we consider the Dirichlet 
boundary condition and assume that the functions from Hh(D h ) vanish at 

r h . 

If we introduce a uniform fictitious grid Q h , then it is possible to modify 
the operators R and T from Section 2 for locally refined triangulation Cl h , 
but realization of a preconditioner will be expensive. 

Let us embed the domain 0 in a square II and start with a coarse uniform 
grid IIq. We refine IIq several times 

Tf/l ITT h 
11 0 5 • • • 

The grid Ilf consists of cells D^j . Let Q f denote the minimum figure that 
consists of cells D^ and contains D h . Denote by / 0 a set of indices (i,i) such 
that 

U of 

(i,j)£lo 
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We define grids , Q %, . . . in the following way. Denote by 7/ a set of indices 
(i,j) such that the cell D® contains more than one vertex of the triangulation 
D h . We divide Dfj and all neighboring cells (which have at least one common 
node with D ^ ) into four congruent sub cells by connecting the midpoints 
of the edges. Denote new cells by and a resulting grid by Qf +l , l = 

0, 1, . . ., which are the minimum figure that contains fl h . We stop this process 
when each cell contains no more than one vertex of D h . Denote by Qj the 
final grid. 

Define a finite-element space Hh(Q h ) as follows: 

H k (Q h ) = { £ 4 0 ) 4 0) +£ £ £ 4' +1) 4' +1) 1 4 ° e m, } 

sup P $ ( fe 0) cQ$ ' =0 b'h)ehsu P p$ ( fc 1+1) n 

We now define the projection operator R 

R: H h (Q h ) -> H h (D h ) 

the extension operator T 

T: H h (D h ) - H h (Q h ) 

according to the definitions from Section 2. 

Define a preconditioning operator in Hh(Qj) in the following way: 

cs‘a i = £ (c\4%<<3;)4°4£ £ £ (c\4' +,) ) M q5)4 ,+1) 

supp$ ( fc 0) c Q>} 1=0 

for any U h G H h (Q h j). 

Theorem 4.1 There exist positive constants cyj and Ci 8 , independent of 
h, such that 

cniA-'u, u ) < (RCr X R*u, u) < ci 8 (A~ 1 u, u) WeJR N . 


Proof. In this case, we again use the equivalence of f? 1 -norms of finite- 
element functions in the spaces H h (O h ), H h (Q h ) and the difference counter- 
parts of these norms and the multilevel technique. 
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MULTIGRID METHODS FOR EHL PROBLEMS 


Elyas Nurgat and Martin Berzins 
School of Computer Studies, University of Leeds 
Leeds, LS2 9JT, UK 


INTRODUCTION 

In many bearings and contacts, forces are transmitted through thin continuous fluid films which 
separate two contacting elements. Objects in contact are normally subjected to friction and wear 
which can be reduced effectively by using lubricants. If the lubricant film is sufficiently thin to 
prevent the opposing solids from coming into contact and carries the entire load, then we have 
hydrodynamic lubrication, where the lubricant film is determined by the motion and geometry of 
the solids. However, for loaded contacts of low geometrical conformity, such as gears, rolling 
contact bearings and cams, this is not the case due to high pressures and this is referred to as 
Elasto-Hydrodynamic Lubrication (EHL) (ref. 1). In EHL, elastic deformation of the contacting 
elements and the increase in fluid viscosity with pressure are very significant and ca nn ot be ignored. 

Since the deformation results in changing the geometry of the lubricating film, which in turn 
determines the pressure distribution, an EHL mathematical model must simultaneously satisfy the 
complex elasticity (integral) and the Reynolds lubrication (differential) equations. The nonlinear 
and coupled nature of the two equations makes numerical calculations computationally intensive. 
This is especially true for highly loaded problems found in practice. One novel feature of these 
problems is that the solution may exhibit sharp pressure spikes in the outlet region (ref. 1). 

To this date both finite element and finite difference methods have been used to solve EHL 
problems with perhaps greater emphasis on the use of the finite difference approach. In both cases, 
a major computational difficulty is ensuring convergence of the nonlinear equations solver to a 
steady state solution. Two successful methods for achieving this are direct iteration and multigrid 
methods. 

Direct iteration methods (e.g Gauss Seidel) have long been used (e.g Hamrock and Dowson 
(ref. 2)) in conjunction with finite difference discretizations on regular meshes. Perhaps one of the 
best examples of the application of such methods is the recent Effective Influence Method of 
Dowson and Wang (ref. 3). Multigrid methods have also been used with great success by Venner 
(ref. 4) and Venner and Lubrecht (ref. 5) with a good summary being given by Venner (ref. 6). 

As both these finite difference discretization based approaches appear to provide an efficient 
way of solving EHL problems, it is important to understand their relative merits. This paper is a 
first attempt at providing such an understanding in the context of EHL point contact problem, 
(contact of two spheres), in which the contact zone is a point and an ellipse or circle for unloaded 
and loaded dry contacts respectively. Since the film thickness and the contact width are generally 
small compared to the local radius of curvature of the two surfaces, the reduced geometry of the 
surfaces in the contact area can be accurately approximated to the contact between a paraboloid 
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and a fiat surface. 

The layout of the remainder of this paper is as follows. In section 2 we introduce the form of the 
equations to be solved. The Effective Influence Newton Method is described in Section 3 while 
Section 4 describes the Multigrid method to be used. Sections 5 and 6 describe the test problems 
to be used in the comparison between the two methods and compare the performance of the two 
methods. Section 7 concludes the' paper with an argument of the two methods and suggests some 
future research directions. 


GOVERNING EQUATIONS 


The Mathematical model describing the isothermal axisymmetric EHL circular contact problem 
consists of three equations. The Reynolds Equation relates pressure, P, to geometry of the gap, the 
film thickness, H, and velocities of the running surfaces. 


d 


dP' 


- , dP\ d 

L ^ ~ dx J + dy [ 6 dy 


d( P H) 

dx 


= 0, x,y <E [-3.5, 1.5] x [-2,2] 


( 1 ) 


with the cavitation condition P > 0 and P = 0 on boundaries. The function e = (pH 3 )/(rjX) 
depends on viscosity, y(P), density, p(P), and film thickness, H(x,y). The remaining terms are 
given by: 

f 1 4. Wh?__ p > o 

P = \ , 1+vphP , (ref. 6); 


1 otherwise 

V - exp {£2a[-l + (1 + ^ P ) z ]} , (ref. 6); 

Ph is the maximum Hertzian pressure given by ph = ; 

a = pressure viscosity coefficient, 2 = 0.68 is the pressure viscosity index; 
A = Tj \! -£7 and po = 1.98 x 10 8 are constants; 


M V 3M 

p, = 5.8 x lO -10 and v = 1.68 x 10~ 9 are empirical constants; 

L and M are the Moes (ref. 6) dimensionless material and load parameters, respectively. For 
lightly loaded problems ph, which is a function of M and L, is about 0.5 GPa. Moderately loaded 
problems have ph in the range of about 1 GPa. 

The Film Thickness Equation, H(x,y), computes the elastic distortion of the surfaces caused by 
the pressure in the film and is written as: 


H(x, y) 


x 2 y 2 2 

Hoo + T + T + V> 


/ OO PCX 

-oo J —c 


P(x ,y') dx dy' 
\J{x-x') 2 + (y-y'Y 


( 2 ) 


where Hqo is a constant. 

The final equation is the Force Balance Equation which ensures that the integral over the 
pressure balances the external applied load: 

/ oo poo 

I P(x, y) dx dy = External Force. 

-oo J —oo 

The nondimensionalisation employed allows the external force to be scaled to (27 t)/3. 


( 3 ) 
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Finite Difference Discretization of Governing Equations 


The focus of this study is on the iterative solution methods for the nonlinear equations and so 
in order to allow comparison with existing results we shall follow most EHL studies and use a 
regular mesh. The governing equations are discretized on a regular rectangular grid with the 
direction of flow in the x direction and the mesh spacings h x and h y in the x and y directions, 
respectively. Due to symmetry, only half the domain is used in the y direction. Reynolds 
Equation (1) is discretized at each non boundary mesh point ((i — 1 )h x + x a , ( j — l)h y — y c ) 

where x, y G [x 0 , x&] x [— y c , y c \, using central and backward differencing to get, (ref. 6), 

e «- — e i — ^».i) T h x h y 1 — Pi,j) + 

1 — Pi,j)) ~ h x (pi t jHi t j — pi-ijHi-ij) = 0 ( 4 ) 

where e i+ i •, e,-_i -, e— + 1 , e—_i denote the values of e at the intermediate locations midway between 
meshpoints. 

The discretized film thickness equation (2) at a point (i,j) is given by: 


x ? ?/? 

Hij = H 00 + + d, 




( 5 ) 


where Hoo is a constant and J is the elastic deformation of the material due to the applied load as 
defined below. 


Elastic Deformation Integral 

The elastic deformation on the surface of a solid depends on the representation of applied 
normal pressures. The simplest procedure is to divide the pressure distribution into rectangular 
blocks of uni form pressure. The elastic deformation at a point (x, y), d x<y , due to the uniform 
pressure over the rectangular area 2a2b is given by (ref. 6) : 

_ 2 P f b r a dx i dyi 

1 r 2 J- b yj(x - xi ) 2 + {y - yi ) 2 

If the entire domain is divided into equal rectangular areas, then from Dowson and Hamrock 
(ref. 7), the elastic deformation at a point (i,j), djj, due to contributions of all rectangular areas of 
uniform pressure is given by: 





7 r 


o Trix n y 


Jh=i l=i 


( 7 ) 


where m = \i — k\ + 1, n = \j — l\ + 1, m x and n y are the maximum number of points in the x and y 
directions, respectively. The coefficients K m ^ n are given by: 



f x q + \/yq+ x \ \ 
V^P+V^f +®p/ 


+ |x,| In 


( yq + \/ x2 q+yq \ 

\y P +V*t+yl) 
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where 

Xp = Xi - x k + % x q = Xi -x k -!f y p=yj -y l + !h. y q = y .__ yi _h_ < 

One advantage of a regular mesh is that the m x n y coefficients need only be calculated once and 
stored. In contrast, on an irregular mesh it is necessary to store m x n y coefficients for each mesh 
point. 

The force balance equation (3) determines the value of the integration constant H 00 and is 
discretized as follows: 


TTlx 


h X hy ^2 ^2 Pi,j 


= 1 j=l 



( 8 ) 


The system of equations (4), (7) and (8) thus constitutes a system of integro-differential 
equation s. The initia l pressure distribution is given by the Hertzian pressure profile, (ref. 6). That 
is P = y/\ — x 2 — y 2 if x 2 + y 2 < 1 otherwise P = 0. 


EFFECTIVE INFLUENCE NEWTON METHOD, [ref. 8] 


For EHL problems, when Newton’s method is used, the discretized nonlinear equation is 
linearized and solved using Gaussian elimination or an iteration method. Gaussian elimination may 
be used if the dimension of the coefficient matrix, Jacobian matrix, of the linear system is small. 
For EHL problems, a full Jacobian matrix is required because the elastic deformation at one point 
is determined by the pressure distribution over the entire grid. For a mesh of m x , n y points, this 
results in an often prohibitively large dense system of m x n y equations. It is thus essential to seek 
computationally less expensive methods. 

The Effective Influence Newton Method developed by Wang (ref. 8), to solve EHL problems, is 
a variant of Newton’s method for solving nonlinear equations. This method employs the notion of 
effective infliience to determine the contribution from elastic deformation in the solution of the set 
of approximate linear equations used in Newton’s formulation of the EHL problem. The elastic 
deformation at a point (i,j) is and must be determined by the pressure distribution over the entire 
domain, though the contribution decreases radially outwards. However, when obtaining the 
solution of the linearized Reynolds equation by Newton’s method, pressures not close to the point 
(i,j) can be ignored. t 

The elastic deformation at a point (i,j) due to a rectangular area of uniform pressure at some 
other point is strongly influenced by the distance between the two points. This enables us to define 
an effective influence region such that only the pressures within the. region need to be considered 
when solving the approximate linearized Reynolds equation. This results in a banded, rather than 
full, Jacobian matrix, thus reducing the computational work involved in the EHL calculation. 

Suppose P is an approximation to the true solution P, then. at a point = L(P);j ^ 0 

and = L(P)i t j = 0. Taylor’s theorem gives: 



Li, j + 


ny m * dLij 


i=i k = i QPw 


A P ktl + 0((AP) 2 ) 


( 9 ) 
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where Lij is the discretized Reynolds equation (1) at the point (xi,yj). 

If (rrii) and (nj) are the number of effective points, from the point (i,j), in the x and y 
directions, respectively, then the Effective Influence Newton's formula is of the form: 


j+ n j i+m,i 

£ £ 


l—j — Tij k=i—m t 


dLjj 

dP k ,t 


AP k,i + Ljj — 0 . 


( 10 ) 


The simplest form of the Effective Influence Newton’s method makes use of five adjacent nodal 
points in linearizing the original Reynolds Equation. This is the method employed by Dowson and 
Wang (ref. 3) in solving the EHL problems and is of the form: 


dLiJ A P,_ hj + ^h±AP tJ + dLiJ 


dPi 


-id 


dP, 


dp. 


A P 


dU 


t+i.j 




J hj 


+u 


dPi 


j-l 


A nnew ^ P»ij 

1 ft p. r 


A P° ld 

lXr i,j + 1 


( 11 ) 


For a constant j, equation (11) results in a tridiagonal system of equations which are solved 
simultaneously using I-line relaxation, provided that and AP° l J +1 are known. In every 

iteration the correction term A Pjj is evaluated on the entire grid. Having obtained AP, a new 
approximation P;j to Pij is computed on the entire grid using: 

Tij = Pj-WAPij (12) 

where IT is a damping factor in the range 0.09 to 0.2. 

The new values of pressure are then used to calculate the elastic deformation, di,j, and the film 
thickness constant, H 0 o, of the film thickness equation (5). Poo is updated using the force balance 
equation (8) and is given by: 


2tt ny 

Hqq = Hqo c( „ h x hy ^ ^ ) ) Pi j ) 

6 i = 1 j=l 


(13) 


where c is a small constant taken, here as 10 2 . 

The technique employed to analyze the convergence of the solution is based on the change in 
the solution from one iteration to the next. Thus, the ERROR on the k th iteration is given by: 


ERROR = 


E m x y->«y I pk pk— 1 I 

i = 1 2->j = 1 \ r i,j r i,j I 

E m* pk 

i = 1 r i,j 


(14) 


and the iteration is terminated when ERROR < TOL, where TOL is a user supplied tolerance. 
The results of Dowson and Wang (ref. 3) and (ref. 8) show that the method works well for many 
different types of EHL problems. 


MULTIGRID METHOD 

The use of multigrid methods in solving EHL problems is relatively new. This method was 
introduced into the field of Tribology by Lubrecht (ref. 9), who through his extensive work has 
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made multigrid techniques an important technique for solving EHL problems. The use of 
multigrids for solving EHL line and point contact problems has been described by Venner (ref. 6). 

The concept of multigrid iteration depends on the asymptotic nature of errors associated with 
iterative schemes and how the schemes reduce these errors. Smooth error components associated 
with low frequencies are hardly reduced with the classical iterative schemes, thus resulting in a long 
time to converge. The opposite is true for error components with wavelength of the order of the 
meshsize. However, low frequency error components can be adequately represented on coarser 
grids. In a multilevel solver, which makes use- of a series of coarser grids, each error component is 
solved until the component becomes smooth when the procedure is switched onto- a coarser grid. 


Full Approximation Scheme 

FDMG Multigrid Software of Gareth Shaw (ref. 10) is used as a starting point for implementing 
the multigrid technique. FDMG employs Multigrid Full Approximation Scheme (FAS) to solve 
nonlinear systems of partial differential equations using either V or W coarse grid correction cycle. 
Jacobi or Gauss-Seidel iterative method can be used as a smoother. The option for the type of 
restriction is either injection or full weighting. 

EHL problems are nonlinear, thus when using multigrids the standard Correction Scheme can 
not be used; instead the Full Approximation Scheme must be used. In the cavitation region, in 
which negative pressures are computed by the solver, the Reynolds equation is not valid and the 
computed pressures are set to zero in the standard manner (ref. 6). This is treated with the 
multigrid method by using injection near and in the cavitational region when transferring the 
residual and solution to the coarse grid. Full weighting is used in the remaining part of the domain. 
The elastic deformation and the force balance equation gets updated on each grid using the 
updated pressure values. The only substantial modification to FDMG has been to take symmetry 
boundary conditions and cavitation into consideration. The main difference from the scheme of 
Venner (ref. 6) is that he uses a combination of Jacobi and Gauss Seidel rather than the Gauss 
Seidel scheme used here. 


Relaxation 

The solution for the isothermal point contact problem is obtained by I-line relaxation due to 
strong coupling in the direction of flow, x direction. The discrete equations are solved 
simultaneously on a line of points, sweeping across the grid only in the positive y direction due to 
symmetry. On each line of points, the Effective Influence Method is employed, as described above, 
and a tridiagonal system of equations is solved. The criterion for convergence are based on 
comparing the solutions on two grids with meshsize h and H = 2 h. Thus the error, ERR(h,H), as 
used by Venner (ref. 6) to measure convergence is given by: 

m x n y 

ERR(h,H) = M„££ \pZ - 4*4-1 • (15) 

J=1 j=l 


628 



TEST PROBLEM ONE 


This test problem, which appears in Wang (ref. 8), is solved on a single 151 by 81 grid of 
domain {( x,y ) : —3.5 < x < 1.5,— 2.0 < y < 2.0}. For this moderately loaded problem, the 
values of Moes (ref. 6) dimensionless parameters are M = 99 and L = 16. This in turn gives 
A = 2.397494 x 10 -2 . The maximum Hertzian pressure, ph , at this load is 1.21 GPa if 
a = 2.205645 x 10 -8 . The equivalent Hamrock and Dowson’s (ref. 11) dimensionless parameters 
with U fixed at 5.6102 x 10 -11 are W = 3.4125 x 10 -6 and G = 4865. 

This problem is solved by using the Effective Influence Newton method for 1500 iterations. 
Every 50 iterations the minimum, Hmin, and central, Hcent, film thickness is recorded. Table 1 
shows Hcent and Hmin together with the equivalent minimum film thickness of Hamrock and 
Dowson, HDHmin. The minimum and central film thickness achieved by Wang (ref. 8) after 100 
iterations is 0.28827 x 10 -4 at (I,J)=(1 13,24). Figure 1 shows the profiles of the pressure and film 
thickness along the x-axis. The pressure spike near the outlet is an often observed feature of EHL 
solutions. 


Its 

Hcent 

Hmin @ (I,J) 

HDHmin 

RMS RES 

SumP 

ERROR 

50 

0.2679E+00 

0.1170E+00 (126, 1) 

0.3476E-04 

0.171E-02 

0.9529 

0.144E-01 

100 

0.1316E+00 

0.5548E-01 (113,18) 

0.1648E-04 

0.158E-02 

1.7756 

0.616E-02 

150 

0.1505E+00 

0.7472E-01 (111,19) 

0.2219E-04 

0.149E-02 

2.0660 

0.106E-02 

200 

0.1683E+00 

0.8592E-01 (111,19) 

0.2552E-04 

0.143E-02 

2.1125 

0.294E-03 

250 

0.1787E+00 

0.9148E-01 (111,19) 

0.2717E-04 

0.137E-02 

2.1109 

0.234E-03 

300 

0.1849E+00 

0.9366E-01 (114,18) 

0.2782E-04 

0.131E-02 

2.1052 

0.171E-03 

350 

0.1891E+00 

0.9504E-01 (114,18) 

0.2823E-04 

0.126E-02 

2.1017 

0.123E-03 

400 

0.1922E+00 

0.9610E-01 (114,18) 

0.2854E-04 

0.122E-02 

2.0996 

0.931E-04 

450 

0.1946E+00 

0.9689E-01 (113,18) 

0.2878E-04 

0.117E-02 

2.0984 

0.739E-04 

500 

0.1965E+00 

0.9753E-01 (113,18) 

0.2897E-04 

0.113E-02 

2.0977 

0.605E-04 

750 

0.2024E+00 

0.9949E-01 (113,18) 

0.2955E-04 

0.947E-03 

2.0958 

0.283E-04 

1000 

0.2053E+00 

0.1004E+00 (113,18) 

0.2983E-04 

0.795E-03 

2.0952 

0.163E-04 

1500 

0.2082E+00 

0.1013E+00 (113,18) 

0.3008E-04 

0.583E-03 

2.0947 

0.702E-05 


Table 1: Test Problem One on a single 151 by 81 grid, M=99 & L=16 


Convergence Criteria. Table 1 also shows the errors, associated with the solution, from which 
the accuracy of the solution can be analyzed. If the convergence criteria are based, as in Wang, see 
equation (14), (ref. 8), on the change in the solution from one iteration to the next, labelled 
ERROR in Table 1, then the solution has converged to the order of 10~ 5 . After 100 iterations the 
solution has converged to the order of 10 -2 and the corresponding error value found by Wang 
(ref. 8) is 0.182 x 10 -3 on the same grid. 

The sum of the pressures over the entire grid, labelled SumP in Table 1, also suggests that the 
iteration is converging as the sum of pressures on the final iteration is converging towards 2.0943, 
thus obeying the force balance equation (8). 
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However, if the convergence is based on the Root Mean Square Residual, labelled RMSRES in 
Table 1, then it can be said that the solution may not have completely converged. The reason for 
this is due to the nature of the Reynolds equation. The coefficient e of the Reynolds equation plays 
a vital role in the solving of these equations. The pressures in the contact region, x 2 + y 2 < 1, are 
larger than those in the non contact region. This makes the coefficient e vary by several orders of 
magnitude over the computational domain. Consider the case along the line of symmetry, y=0. In 
the contact region e is very small ranging from 10~ 9 to 10~ 2 , whereas in the non contact region e 
varies from 10 -1 to 10 4 as can be seen from Figure 2. Thus, when e is very small the film thickness 
derivative part of the Reynolds equation dominates, whereas when e is large the contribution from 
the film thickness derivative part is minimal. Figure 2 also shows that the residuals are between 
two and four orders of magnitude smaller in the contact region than in the inlet and outlet regions. 

Y 



- 3.00 - 2.00 - 1.00 0.00 1.00 


Figure 1: Pressure and Film profiles along y=0, Test Problem One. 
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Its 

Hcent 

Hmin @ (I,J) 

RMSRES 

SumP 

ERR(4,3) 

10 

0.187E+00 

0.140E+00 (80, 1) 

0.34945E-02 

1.6123 

0.8557E-02 

20 

0.109E+00 

0.618E-01 (94,31) 

0.32213E-02 

2.1197 

0.1654E-02 

30 

0.143E+00 

0.772E-01 (95,30) 

0.31004E-02 

2.1049 

0.1021E-02 

40 

0.157E+00 

0.831E-01 (95,30) 

0.30013E-02 

2.0965 

0.6970E-03 

50 

0.166E+00 

0.864E-01 (96,29) 

0.29144E-02 

2.0924 

0.5218E-03 

60 

0.172E+00 

0.887E-01 (96,29) 

■ 0.28358E-02- 

2.0907 

0.4102E-03 

70 

0.177E+00 

0.907E-01 (96,29) 

0.27631E-02 

2.0889 

0.3278E-03 

80 

0.181E+00 

0.922E-01 (96,29) 

0.26951E-02 

2.0874 

0.2913E-03 

90 

0.184E+00 

0.934E-01 (96,29) 

0.26307E-02 

2.0864 

0.2887E-03 

100 

0.186E+00 

0.944E-01 (96,29) 

0.25695E-02 

2.0858 

0.2837E-03 

150 

0.194E+00 

0.977E-01 (96,29) 

0.22968E-02 

2.0853 

0.2559E-03 

200 

0.199E+00 

0.996E-01 (95,29) 

0.20632E-02 

2.0865 

0.2272E-03 

250 

0.202E+00 

0.101E+00 (97,28) 

0.18582E-02 

2.0878 

0.2026E-03 

300 

0.204E+00 

0.102E+00 (97,28) 

0.16774E-02 

2.0888 

0.1826E-03 

350 

0.206E+00 

0.102E+00 (97,28) 

0.15188E-02 

2.0896 

0.1661E-03 


Table 2: Analysis of solution solved using multigrid, 129 by 129, M=99 & L=16 
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It is not possible to use this mesh with the FDMG code, which requires the meshsize on level k 
to be given by 2 fe — 1. Instead meshes between 129 by 129 and 17 by 17 are used with FDMG. The 
results are shown in Table 2 and show broad agreement between the two methods. 


TEST PROBLEM TWO 

This test problem, which appears in Venner (ref. 5), is solved on a single 129 by 129 grid and a 
multigrid where the finest grid is 129 by 129 and the coarsest grid is 17 by 17. Due to symmetry, 
only the nodes in the positive y direction are used. For this lightly loaded problem, the values of 
Moes dimensionless parameters are M = 20 and L = 10. This in turn gives A = 0.2. The maximum 
Hertzian pressure, p^, at this load is 0.58 GPa if a = 1.7 x 10~ 8 . The equivalent Hamrock and 
Dowson’s dimensionless parameters with U fixed at 1.0 x 10~ n are W = 1.8915 x 10 -7 and 
G = 4729. 

This problem was solved using 300 multigrid V-cycles with the results recorded every 10 
iterations as shown in Table 3. The corresponding entries, from a single grid for 1500 iterations 
recorded every 100 iterations, are shown in Table 4. 


Its 

Hcent 

Hmin @ (I,J) 

RMSRES 

SumP 

ERR(4,3) 

10 

0.444E+00 

0.387E+00 (84 , 1) 

0.3368E-02 

1.6055 

0.100E-01 

20 

0.246E+00 

0.158E+00 (97 ,29) 

0.3038E-02 

2.1670 

0.455E-02 

30 

0.349E+00 

0.225E+00 (99 ,27) 

0.2849E-02 

2.1304 

0.160E-02 

40 

0.380E+00 

0.236E+00 (99 ,26) 

0.2700E-02 

2.1080 

0.854E-03 

50 

0.400E+00 

0.243E+00 (99 ,26) 

0.2569E-02 

2.1081 

0.651E-03 

60 

0.417E+00 

0.251E+00 (100,25) 

0.2450E-02 

2.1090 

0.558E-03 

70 

0.429E+00 

0.257E+00 (100,25) 

0.2341E-02 

2.1075 

0.468E-03 

80 

0.439E+00 

0.261E+00 (100,25) 

0.2239E-02 

2.1056 

0.411E-03 

90 

0.447E+00 

0.265E+00 (100,25) 

0.2143E-02 

2.1039 

0.361E-03 

100 

0.454E+00 

0.268E+00 (101,24) 

0.2053E-02 

2.1024 

0.310E-03 

120 

0.464E+00 

0.272E+00 (100,24) 

0.1888E-02 

2.1000 

0.237E-03 

140 

0.472E+00 

0.275E+00 (100,24) 

0.1740E-02 

2.0981 

0.182E-03 

160 

0.478E+00 

0.278E+00 (100,24) 

0.1607E-02 

2.0966 

0.139E-03 

180 

0.483E+00 

0.280E+00 (100,24) 

0.1489E-02 

2.0956 

0.113E-03 

200 

0.487E+00 

0.281E+00 (100,24) 

0.1384E-02 

2.0949 

0.935E-04 

250 

0.494E+00 

0.284E+00 (100,24) 

0.1173E-02 

2.0938 

0.629E-04 

300 

0.499E+00 

0.286E+00 (100,24) 

0.1028E-02 

2.0933 

0.451E-04 


Table 3: Test Problem Two solved using multigrid, 129 by 129, M=20 & L=10 
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Its 

Hcent 

Hmin @ (I,J) 

RMSRES 

SumP 

ERROR 

10 

0.1058E+01 

0.9588E+00 

(107, 1) 

0.3103E-01 

2.2451 

0.138E-01 

100 

0.5642E+00 

0.4797E+00 

(75 . 1) 

0.2430E-02 

1.4367 

0.583E-02 

200 

0.2143E+00 

0.1225E+00 

(100,27) 

0.2215E-02 

2.0141 

0.289E-02 

300 

0.3136E+00 

0.1999E+00 

(98 ,28) 

0.2101E-02 

2.1684 

0.567E-03 

400 

0.3585E+00 

-0.2260E+00 

(100,26) 

0.2017E-02 

2.1243 

0.391E-03 

500 

0.3780E+00 

0.2329E+00 

(99 ,26) 

0.1945E-02 

2.1089 

0.247E-03 

600 

0.3932E+00 

0.2395E+00 

(99 ,26) 

0.1879E-02 

2.1067 

0.194E-03 

700 

0.4062E+00 

0.2455E+00 

(100,25) 

0.1818E-02 

2.1047 

0.163E-03 

800 

0.4167E+00 

0.2503E+00 

(100,25) 

0.1761E-02 

2.1028 

0.137E-03 

900 

0.4253E+00 

0.2543E+00 

(100,25) 

0.1707E-02 

2.1014 

0.117E-03 

1000 

0.4326E+00 

0.2577E+00 

(100,25) 

0.1656E-02 

2.1004 

0.102E-03 

1500 

0.4576E+00 

0.2692E+00 

(100,24) 

0.1431E-02 

2.0976 

0.585E-04 


Table 4: Test Problem Two solved on a single 129 by 129 grid, M=20 & L=10 


Results. The values obtained after 1500 iterations on a single grid, shown in Table 4, for the 
central, labelled Hcent, and minimum, labelled Hmin, film thickness is achieved using 120 multigrid 
iterations as shown in Table 3. Thus 1500 single grid iterations correspond to about 120 multigrid 
iterations. For this problem, Venner (ref. 6) achieved 0.502 and 0.349 for Hcent and Hmin, 
respectively, using a grid of {( x,y ) : —4.5 < x < 1.5,— 3.0 < y < 3.0}. If convergence is based on 
the sum of pressures on the entire grid, labelled SumP, then the value obtained using a multigrid 
method is slightly better than that obtained using a single grid method. Although the change in 
solution from the finest grid, 129 by 129, and the grid just above it, labelled ERR(4,3) in Table 3, 
and the change in solution from one iteration to next on a single grid, labelled ERROR in Table 4, 
are evaluated differently, they both seem to suggest that the solution has converged to the order of 
10 -4 . Vernier’s results quote a value of ERR(4,3) of 0.122. The relative computation times on a 
SGI R4400 workstation for the two methods on this problem are 8:00:00 on a single grid for 1500 
iterations and 7:15:00 for 300 multigrid V-cycles. The multigrid method thus provides a means of 
obtaining solutions with greater efficiency. One potential area of difficulty with the multigrid 
method is that if the coarsest multigrid cannot adequately represent the solution, then the method 
may exhibit convergence difficulties. 

Contour line plots of the film thickness and pressure showing the formation of side-lopes and 
the spike region are shown in Figures 3 and 4, respectively. The cavitated region is clearly shown 
on the right side of Figure 4 and is preceded by the pressure spike region which can be seen more, 
clearly in Figure 5. 
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Figure 5: 3D pressure profile on MG, 129 by 129, M=20 & L=10. 


CONCLUSIONS 

The numerical results shown in this paper demonstrate how even a relatively standard multigrid 
code may be used to speed up the solution of EHL problems. The combination of the Effective 
Influence Method and multigrid method, which are both effective on their own, also appears to 
work well. 

An outstanding issue concerns the treatment of convergence in EHL problems. From a practical 
engineering point of view it is the pressures and film thicknesses in the contact zone that are of 
interest and thus it is changes in these pressures which must tend to zero. The much larger 
residuals in the inlet region where the pressure is close to zero, though of potential cause of 
concern, may not influence the values of pressure in the contact region unduly. Furthermore, the 
Reynolds equation derivation is based on assumptions that are less valid in the inlet region. This is, 
however, an issue that needs to be further explored. 

One possible way of obtaining a better understanding of the relationship between the residual 
and the solution is to compute error indicators in conjunction with adaptive meshes probably using 
a hierarchy of regular mesh patches to resolve the steep gradients in the pressure. It is this 
approach that will be our future research in. this area. 
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SUMMARY 

In this paper we look at Krylov subspace methods for solving the transport equations 
in a slab geometry. The spatial discretization scheme used is a finite element method 
called Modified Linear Discontinuous scheme (MLD). We investigate the convergence 
rates for a number of Krylov subspace methods for this problem and compare with 
the results of a spatial multigrid scheme. 


INTRODUCTION 

Transport equations describe the scattering and re-scattering of particles such as 
neutrons in a nuclear reactor, or light and infra-red radiation in the atmosphere. 
These equations are important, not only in nuclear engineering, but also in the study 
of the effects of greenhouse gases on the climate. A particularly important, although 
simple, model is of a single slab; this leads to integro-differential equations in one 
spatial variable and one angular variable. Unlike elliptic partial differential equations, 
these equations are based on highly non- normal operators, and require special care 
in their numerical treatment, especially for the regimes of physical interest: strong 
scattering, and weak or no absorption. 

In the past decades there has been a great deal of work on numerical methods 
for large scale problems, such as partial differential equations. In this paper we focus 
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on two of them: multigrid methods and Krylov subspace methods, as well as their 
application to transport equations. 

In the past decade there has been an enormous development of Krylov subspace 
methods for non-symmetric and indefinite systems. These methods only require three 
operations to be available for their implementation: linear combinations, inner prod- 
ucts, and matrix-vector products. Of these, it is assumed that matrix-vector products 
are the most complex to compute. As a result they can be efficiently implemented 
on scalar, vector and parallel computers. 

These Krylov subspace methods that have been developed are all based on either 
the symmetric Lanczos, unsymmetric Lanczos, or Arnoldi methods for computing 
bases of Krylov subspaces. These include the CGS (Conjugate Gradient Squared) 
method, which is from the family of methods that uses the unsymmetric Lanc- 
zos method; the GMRES (Generalized Minimal RESidual) method, which uses the 
Arnoldi method; and LSQR (Least Squares/QR) approach, which uses the symmetric 
Lanczos method. 

In addition, Krylov methods allow the easy incorporation of preconditioners. For 
solving Ax = 6, a preconditioner is a matrix B. where Bu can be easily computed 
given a vector u and the system BAx = Bb is easier to solve than the original 
system. Usually this is understood as finding B such that BA is a well conditioned 
matrix. Suitable matrices B can obtained by a number of different means. If A is 
“diagonally dominant”, then B can be simply the inverse of the diagonal of A; other 
preconditioners are based on Gauss-Seidel or SOR iterations; another source is that 
of incomplete factorizations of sparse matrices, for example, ICCG, which combines 
incomplete Cholesky factorization with conjugate gradients. For a preconditioner to 
be incorporated into a Krylov subspace method, it is sufficient to use a routine to 
compute BAu for a given vector u by first computing v = Au and then using a routine 
to compute Bv. 

Another class of algorithms that has been extensively developed in the past decade 
are multigrid, or multilevel, algorithms. These have found a great deal of success in 
dealing with elliptic partial differential equations. Some multigrid methods have 
been developed for solving special cases of transport equations [5, 9, 10]. For one- 
dimensional problems, these can give exceptionally small convergence factors, and 
thus are extremely good methods [4, 5, 9, 10]. The development of parallel software for 
these methods is very time consuming due to the relaxation schemes used. For more 
general problems, and for two and three dimensional problems, the more “generic” 
Krylov subspace methods may be more suitable. 

In this paper the usage of multigrid methods developed in [4, 5, 9] is investigated 
for the case of isotropic scattering with small but significant absorption. This case 
can lead to difficulties with the multigrid method given in [4, 5, 9], as is noted in 
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[6]. In [6] a modified algorithm is developed to handle the case with isotropic scat- 
tering; however here the Krylov subspace technique GMRES is used with the “pure 
scattering” multigrid algorithm to improve its performance and robustness. 


TRANSPORT EQUATIONS 


The description of the neutron transport problem is given in previous papers 
[ 1 , 3 , 5 ]. For steady state problems within the same energy group for the isotropic 
case (by isotropic we mean that the probability of scattering for the particles is the 
same for all directions), the transport equation in a slab geometry of slab width b 
becomes 




di' 

dx 


+ cr t il' = ~cr s 



p’W + q{x,u). 


(i) 


for x e ( 0 , b) and p. G [ — 1 , 1 ]- Here, xj,'(x, p) represents the flux of particles at position 
x traveling at an angle 6 = arccos(p) from the x-axis; a t dx, the expected number of 
interactions (absorptive or scattering) that a particle will have in traveling a distance 
dx-, a s dx, the expected number of scattering interactions; cr a = cr t — cr s , the expected 
number of absorptive interactions; and q(x,fi), the particle source. The boundary 
conditions prescribing particles entering the slab are 


V>(0 ,n) = g 0 (n), ^{b,-n) = gi{n), 


(2) 


for fi € (0, 1). 

This problem is difficult for conventional methods to solve in two cases of physical 
interest: 


1. 7 = a s /a t = 1 (pure scattering, no absorption). 

2. 1 / cr t <C b (optically dense). 

In fact, as a t — > oo and 7 — > 1 , the problem becomes singularly perturbed. 

In this paper, the spatial discretization is a special finite element method called 
the Modified Linear Discontinuous (MLD) scheme (described in the next section), 
which behaves well in the thick limit. 

In a previous paper this discretization has been solved by a multigrid algorithm 
[ 4 ]. This multigrid method was based on a two-cell red-black //-line relaxation [ 5 ] 
with convergence factors of order 0((l/cq/i) 2 ) when a t h 1 , and 0 ((a t h) 3 ) when 
cr t h < 1. 
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Note that these multigrid operators are non-symmetric. Thus if they are used to 
precondition a Krylov subspace method, it must be a non-symmetric method such as 
GMRES, CGS, or QMR. In this paper we focus on GMRES. 

DISCRETIZATION 


The angular discretization is accomplished by expanding the angular dependence 
in Legendre polynomials, and is known as the Sn approximation when the first N 
Legendre polynomials are used. This results in a semidiscrete set of equations that 
resemble collocation at N Gauss quadrature points, fij, j = l,...,iV, with weights 
Wj, j = 1, . . . , N. Since the quadrature points and weights are symmetric about zero, 
we reformulate the problem in terms of the positive values, fij, j = 1, . . . ,ra, where 
n = N/2. We define ^4 = ifi(x,fij) and tpj = for j — 1 The 

spatial discretization is accomplished by the MLD scheme, which uses elements that 
are linear across each cell and discontinuous in the upwind direction. In our grid 
representation, the variable denotes the flux of particles at position x,- in the 

direction fij The nodal equations are 


and 


fh^±2£ l_s± + = 7 £ u k {il>l k + 0 i)fc ) + ql 


o-t 


hi 




k=l 


2 ^ l+ \ = 7 X>*(t + + i * + 2 ^ k ~ 


Hi ^i -1 j ~ V’i+i j JL 

2 _! 22 + ^ = 7 + ^i,k) + 


CTf h. 

tl, - ^ 


k = l 




1,3 


&t h{ 


+ ti-y = 7 E * + 2 ^tk ~ fc ) + 


Jt=l 


j = 1,. . . , n, i = 1,. . . , m, with boundary conditions 


(3) 

(4) 

(5) 

( 6 ) 


*$j=Sw C+| tj =9lp (7) 

j = 1 ,...,n. 

In our model, x f+ i and x { _i are cell edges, x,- = |(x i+ i + x t -_i ) is the cell center, 
and hi = x i+ i — x { _i is the cell width, 1 < i < m. Equations (3) and (5) are called 
balance equations and (4) and (6) are called edge equations. In block matrix form 
equations (3) — (7) can be written respectively as 

~ tt-l) + + ti) + it ’ ( 8 ) 
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2 ~ ^ ) + tti = + 2 ±i - ±i-0 + aJu* 


-»+ 


■*'+ 


v+i 


i - ± i+ i_) + 0. = 7 R{±t + & ) + £: » 


2Bi{±._ k - 0. ) +± i _± = + 20+ - ±ti ) + i 


i m+ i=£i 


i = 1, . . . , m. Here, 



Ml/ &thi 

0 


‘ 1 ' 

Bi = 



, and R = 

* 


0 

fin/ &th'i 


_ 1 _ 


u>i 


^ n i 


( 9 ) 

( 10 ) 

( 11 ) 

( 12 ) 


( 13 ) 


where pt \ , pi 2 , ...,//„ are the positive Gauss quadrature points, uq, uq , . . . ,w n are the 
Gauss quadrature weights, and 0+^-* is an n-vector: 0+(~) — (4 > ii~\ 4 J tn~' > ) T • 

In the computational grid, the inflow for positive angles is on the left, and the 
inflow for the negative angles is on the right of the whole domain. Figure 1 shows 
the computational domain with 2 m + 1 spatial points and n angles. For a cell pi - line 
relaxation the inflows of each cell are assumed known. For a //-line relaxation for the 
whole domain only the boundary conditions are assumed known. 

Consider cell i. In one-cell //-line relaxation cell, centers 0+ and 07, together 

with the outflow variables 4>~ 1 and 0+, 1 1 will be updated using the following matrix 

— ■ !+2 

equation: 

Au.i = rhs ■■ + rhsj (14) 

where the matrix A is given by 


' I + 2 Bi — jR — 2~fR —2B{ 'yR 

0 I — jR —7 R Bi 

Bi —7 R I — 7 R 0 

7 R —2 Bi —27 R I + 2Bi — ' fR 


(15) 


with 

U.i = ( u~_ i,uf, uj , u+ 1 ) , 

1 2 * ' 2 

rhs} = (0 , BiyA_ , BiuJ + 1 , 0 ) , 

and 

Solving this matrix equation corresponds to performing a //-line relaxation for one 
cell. To solve this system for all variables we consider the cells coupled together; thus, 
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After the discretizations are chosen for equations (1-2) the problem becomes one 
of finding the best methods for the solution of Ax = b , where A is a q x q matrix 
and x and b are vectors of size q. In these methods, iterative solutions of the form 
x k+1 = x k + p k are constructed, where p k G Kk(A,r°). Kk(A,r° ) is the Krylov space 
of dimension k , where k < q and is defined as the span of r°, Ar ° , A 2 r °, . . . , A fc_1 r°. 

The basic conjugate gradient algorithm of Hestenes and Stiefel [2] for symmet- 
ric positive matrices minimizes the residual in the A~ l norm (||a;|U-i = vx T A~ 1 x) 
over Kk(A,r°). After q steps, without roundoff errors, it zeros the residual. For 
nonsymmetric matrices this method does not work. 

In this paper the solution of non-symmetric discretizations is investigated. Thus 
we must consider other Krylov methods. In addition we investigate the use of multi- 
grid methods as preconditioners. 

Krylov subspace methods are based on either the symmetric or unsymmetric Lanc- 
zos methods, or the Arnoldi method, applied either to A or to a closely related ma- 
trix. The symmetric Lanczos and Arnoldi algorithms generate (in exact arithmetic) 
orthonormal bases for Kk(A,r°), while the unsymmetric Lanczos produces a pair of 
biorthonormal bases for Kk(A,r°) and Kk(A T , r°), respectively. In both cases the 
Lanczos methods produce a tridiagonal matrix that represents the original matrix on 
the Krylov subspaces, while the Arnoldi method produces a He'ssenberg matrix that 
represents the matrix on the Krylov subspace Kk{A,r°). The unsymmetric Lanczos 
process is fast, but can suffer from numerical instability, known as breakdown. There 
are variants of these based on the look-ahead Lanczos algorithm, which is a stabilized 
version of the unsymmetric Lanczos method. 
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One of the most commonly used non-symmetric Krylov subspace solvers is GM- 
RES. This method minimizes the residual over all solution vectors of the form x° +p k 
where p k lies in Kk(A,r°). 


MULTIGRID 

To illustrate the multigrid scheme we consider it in the form of two grid levels. We 
use the notation h to indicate a fine grid and 2 h to indicate a coarse grid, although our 
grids are not really assumed to be uniform. Let L h denote the fine grid operator; L 2h , 
the coarse grid operator; and I$ h and I 2h , the interpolation and restriction operators, 
respectively. Let v x and v 2 be small integers (e.g., v\ = v 2 = 1), which determine the 
number of relaxation sweeps performed before and after the coarse grid correction. 
Then one multigrid v 2 ) cycle is represented (in two-grid form) by the following: 

1. Relax v\ times on L h u h = f h . 

2. Calculate the residual r h = f h — L h u h . 

3. Solve approximately L 2h u 2h = I 2h r h . 

4. Replace u h <— u h + I 2h u 2h . 

5. Relax v 2 times on L h u h = f h . 

The coarse grid operator, L 2h , is defined as 

l 2h = . 

For the isotropic scattering the multigrid scheme was applied with regard to the 
spatial variable in [4, 5]. 

Figure 1 illustrates grid points on the fine grid and on the coarse grid. The inter- 
polation and restriction operators for our previous multigrid schemes for transport 
equations were defined in [4, 5]. The L h operator is given in (15). The coarse grid 
operator L 2h has the same form as L h , but on the new grid. 

NUMERICAL RESULTS 

The numerical results presented here are for the isotropic transport equations, 
both with and without absorption. The methods used are mostly the multigrid 
method of [4, 5, 9] for isotropic transport equations without absorption, by itself, and 
this method used as a preconditioner for GMRES. The methods were implemented 
using the Meschach matrix library in C [12] and were run on a Sun SPARC 20. The 
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Figure 1: Computational Grid 


test problems used had 64, 256, or 1024 cells, 16 angles; a t h has the values 10 1 , 10 2 , 
10 3 , and 10 4 , under several different regimes for 7 = a s /a t . These absorption regimes 
are 7 = 1 — l/(<r t h) 2 , 7 = 1 — l/(a t h) 3 , 7 = 0.99, and 7 = 1 (no absorption). The 
size of the test problems range from 4096 unknowns to 65 536 unknowns; cr t ranges 
from 640 to 1.024 x 10 7 . 

The convergence factors were estimated for randomly generated solutions. The 
convergence factor estimate was obtained by taking the geometric average of the ratios 
of the norms of the residuals obtained from the last 5 iterations for each method, 
except where roundoff error caused the residual norm to plateau. 

Note that in the tables an entry of the form 0 .xxx(i:y) means O.xxx x 10 ±3/ . 

The convergence factor estimates are given in Table 1 (7 = 1 — l/(a t h) 2 ), Table 2 
(7 = 1 — l/(cr t h) 3 ), Table 3 (7 = 0.99), and Table 4 (7 = 1). The first regime is both 
of physical interest and also is the more difficult to solve using the standard MLD dis- 
cretization and the simple interpolation and restriction operators. This corresponds 


# cells 

method 

cr t h 

10 1 

10 2 

10 3 

10 4 

64 

MG 

0.262 

0.736 

0.885 

0.931 

MG+GMRES 

0.0487 

0.191 

0.236 

0.129 

256 

MG 

0.263 

0.741 

0.900 

0.952 

MG+GMRES 

0.0477 

0.208 

0.550 

0.213 

1024 

MG 

0.263 

0.741 

0.905 

0.950 

MG+GMRES 

0.0454 

0.208 

0.695 

0.219 


Table 1: Convergence factors for 7 = 1 — l/(a t h) 2 
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# cells 

method 

< 7th 

■OK 

EtmM 

^plliisgsig 


64 

MG 



0.266 

0.046 

MG+'GMRES 

E 


0.677(— 2) 

0.176(— 2) 

256 

MG 

H 

0.844 

0.722 


MG+GMRES 

EH 

0.165 

0.0559 


1024 

MG 

0.488 

0.896 

0.895 


MG+GMRES 

0.122 

0.484 

0.255 

EfkSjB 


Table 2: Convergence factors for 7 = 1 — l/(cr t h) 3 


# cells method 

a t h 

EHi 




64 MG 

MG+GMRES 

nm 

0.0530 
0.215(— 2) 

0 . 1 11 ( — 2 ) 
0.186(— 4) 

0. 121 ( — 4) 
0 . 221 (— 6 ) 

256 MG 

MG+GMRES 

0.263 

0.0477 

0.0530 
0.285(— 2) 

0 . 111 (— 2 ) 
0.190(— 4) 

0 . 121 (— 4) 
0.216(— 6) 

1024 MG 

MG+GMRES 


0.0530 
0.279(— 2) 

0. 1 1 1 ( — 2 ) 
0.189(— 4) 

0 . 121 (— 4) 
0 . 222 (— 6 ) 


Table 3: Convergence factors for 7 = 0.99 


# cells method 

a t h 





64 MG 

MG+GMRES 

0.320(— 4) 
0.681(— 5) 

0.206(— 6) 
0.710(— 7) 

0.119(— 5) 
0.196(— 6) 

0.116(— 3) 
0.987(— 5) 

256 MG 

MG+GMRES 

0.323(— 4) 
0.105(— 4) 

0.207(— 6) 
0.910(— 7) 

0.187(— 4) 
0.354(— 5) 

0.189(— 2) 
0.135(— 3) 

1024 MG 

MG+GMRES 

0.324(— 4) 
0.160(— 4) 

0.299(— 5) 
0.649(— 6) 

0.303(— 3) 
0.230(— 4) 

0.0321 
0.182(— 2) 


Table 4: Convergence factors for 7 = 1 
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to a situation in which the scattered particles undergo a large number of scatterings; 
in addition they have a significant probability of being absorbed in a cell, and also 
of “escaping” a cell. The numerical difficulty of the problem is clearly evident in the 
convergence factors obtained. 

Results for diverse Krylov subspace methods using diagonal and ILU (Incomplete 
LU factorization) preconditioning are reported in [11], but they were only obtained 
for relatively small values of cr t h. These methods do not seem adequate for the very 
large values of <r t h that are studied here. For example, there it is reported that 
the convergence factor for 100 cells, 4 angles, and a t h — 1, using GMRES with an 
ILU preconditioner, was 0.705 and clearly deteriorates as the number of cells and 
a t h increase. In contrast, with the multigrid method either used directly or as a 
preconditioner, the convergence factor for 256 cells, 16 angles, and a t h — 10 was 
0.734 x 10 -3 in the “no absorption” case. 

The worst regime for absorption is that with 7 = 1 — l/(cr t h) 2 . In this regime, 
deterioration in the rates of convergence for both the direct multigrid and the GM- 
RES/multigrid methods is evident. Nevertheless, with GMRES, the convergence rates 
are significantly faster and would give overall rates of convergence at least twice as 
fast and up to nearly a factor of 30 faster. Since each step of GMRES only requires 
one matrix- vector multiplication for the operator and for the preconditioner and has 
negligible overhead, preconditioning would give improved overall speed. The multi- 
grid methods of [6] appear to give much better convergence factors, but at the cost of 
additional complexity of the algorithm, not to mention the additional effort needed 
to perform the analysis to design the correct operators for handling this case. 

Outside this regime, the GMRES/multigrid algorithm works consistently better 
than the direct multigrid algorithm, and where the original multigrid algorithm per- 
forms well, the GMRES/multigrid algorithm improves the convergence factor by a 
factor of as much as 100. However, in these cases it would only roughly halve the 
number of iterations needed to achieve a small error tolerance. As noted for the most 
difficult regime, where the original multigrid algorithm has difficulty, using it as a 
preconditioner for GMRES gives much better results. 

CONCLUSIONS 


In this paper the multigrid method for isotropic transport equations of [4, 5, 
9] for the “no absorption” case is applied to problems with absorption both as a 
pure iterative method and as a preconditioner for GMRES. In all cases, GMRES 
improves the convergence factor, although the value of this appears to be much greater 
for the cases in which the nonabsorption multigrid algorithm has difficulty (such as 
the absorption regime 7 = 1 — l/(a t h) 2 ). The multigrid algorithm thus has been 
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demonstrated as an efficient preconditioner for GMRES. Together they are robust 
and, in addition, work well for the absorption regime. We expect multigrid methods 
to work well for the other Krylov subspace methods which we have used, such as 
CGS, LSQR, and CGNE, for which preconditioning is essential. 
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SUMMARY 

Existing multigrid techniques are used to effect an efficient method for recon- 
structing an image from noisy, blurred data. Total Variation minimization yields a 
nonlinear integro-differential equation which, when discretized using cell-centered fi- 
nite differences, yields a full matrix equation. A fixed point iteration is applied with 
the intermediate matrix equations solved via a preconditioned conjugate gradient 
method which utilizes multi-level quadrature (due to Brandt and Lubrecht) to apply 
the integral operator and a multigrid scheme (due to Ewing and Shen) to invert the 
differential operator. With effective preconditioning, the method presented seems to 
require 0(n ) operations. Numerical results are given for a two-dimensional example. 

INTRODUCTION 

The problem of reconstructing an image from noisy, blurred data can be repre- 
sented by the model equation 

z = Ku + e, (1) 

where AT is a smoothing operator, e is noise, and u is to be recovered from noisy data 
z. K is typically a Fredholm first kind integral operator, ( Ku)(x ) = / k(x,y)u(y)dy, 
which is compact, so problems of this form are ill-posed; i.e., small perturbations in 
the data will produce wildly varying it’s. 

In the past, attempts to apply multigrid techniques to inverse problems similar to 
this have produced rather disappointing results. Either multigrid has been applied 
directly to (1) without stabilization (see [1] as an example) which produces poor 
quality reconstructions for high noise-to-signal ratios (due to the ill-posedness of the 
problem), or stabilization has been applied, but multigrid displays slow convergence 
(see [2]). In this paper it will be demonstrated how to overcome these difficulties with 
existing multigrid tools, obtaining a fast algorithm to approximate u in (1). 

To stabilize problem (1) Tikhonov regularization, or penalized least squares, is 
used: 

minT Q (u), where T a (u) = ~\\Ku — z\\ 2 2 + aJ(u), (2) 

1 Research was supported in part by a DOE-EPSCoR graduate fellowship. 
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where a is a positive parameter, and J is a known functional. 

A common choice for J is 

J(v) = / |V«| 2 , (3) 

J 

but this assumes u G H l (Q). Hence, it is unsuitable for image processing applications, 
where one wants to recover sharp edges, i.e., discontinuous u. 

In their seminal paper on Total Variation-based denoising [3], Osher, Rudin, and 
Fatemi considered the functional 

J(u) = f |V»|. (4) 

J 

To overcome difficulties associated with nondifferentiability at Vw = 0, consider the 
modification 

J p(u) = j^\Vu\ 2 + /3dx, p> 0. (5) 

For (3 = 0, J /3 is the total variation of u. Figure 1 (excerpted from [4]) depicts a 
comparison of reconstructions of u in (2). In subplot B, J is as in (3), hence the 
reconstruction is smooth; in subplot C, J = Jp as in (5), and a blocky image is 
recovered; and subplot D shows a filtered Fourier reconstruction of the data. Clearly 
Total Variation produces a superior reconstruction in this test case. 

Minimizing T a as given in (2) with J defined as in (5) yields the nonlinear integro- 
differential equation 

du 

K*(Ku — z) + aVJp(u) = 0 for x € Q, and — = 0 for x E d£l. (6) 

This can be written in operator form as 

Ku + aL{u)u = K*z, (7) 

where 

K = K*K (8) 

and L(u ) is the diffusion operator whose action on a function v is given by 



Note that both K and L(u) are symmetric positive semidefinite operators. 

The following fixed point algorithm [4] can then be applied’ to handle the nonlin- 
earity: 

(K + aL(u^))k u+1 ^ = K*z, u = 0, 1, . . . (10) 

At each iteration it is necessary to solve a non-sparse linear system. This paper 
presents multigrid techniques for solving these systems efficiently. 

The Denoising section deals with the case when K is the identity operator, the de- 
noising problem. The Deconvolution section returns to the original problem (1) where 
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A) Exact and Noisy Data 


B) Sobolev H-1 Reconstruction 




C) TV Reconstruction D) Fourier Reconstruction 




Figure 1: Denoised reconstructions obtained using various filtering techniques. Dot- 
ted lines represent noisy data. Solid line in subplot A is exact solution. Solid lines in 
subplots B-D are reconstructions. 

K is a Fredholm first kind integral operator. Included are discussions of multi-level 
integration, preconditioning, and a recapitulation of the algorithm. The Numerical 
Results section discusses observed convergence rates for the numerical implementation 
and includes a two-dimensional example. 

DENOISING 


First, consider the case Ku = u. This corresponds to denoising an image, and 
(10) is reduced to 

(1 + aL(« M ))M (l/+1) = 2 , z/ = 0, 1, — (11) 

At each iteration it is necessary to solve a linear diffusion equation whose diffusiv- 
ity depends on the previous iterate u^. This iteration is referred to as a “lagged 
diffusivity fixed point iteration,” and is denoted here as FP (see [4] for details). 

Note that the diffusion coefficient 1 / \/| Vzz | 2 + (3 is poorly behaved where Vu is 
large. The cell-centered finite difference discretization [5] is applied to overcome this 
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Figure 2: The spectrum of the discretized operator I + aL(u) for a fixed u in one 
space dimension. 

difficulty. After discretization, one must solve a sparse, block tridiagonal matrix 
equation to obtain u ( " +1) at each fixed point iteration. Figure 2 shows the spectrum 
of the operator from (11) for a fixed u. A preconditioned conjugate gradient method 
has been employed with a multigrid preconditioner developed by Ewing and Shen [5]. 

DECONVOLUTION 

Now consider the case when K is a Fredholm first kind integral operator. The 
matrix obtained from the discretization of K + aL(u in (10) is no longer sparse. 
Hence, to use the lagged diffusivity fixed point iteration as before, a full matrix 
equation must be solved for each iteration. The conjugate gradient method can again 
be applied but with a cost of n 2 operations per iteration. In typical 2-D image 
processing applications n 2 « 10 12 ; clearly this operation count is unacceptable. The 
Multi-level integration section describes a scheme for reducing the complexity of one 
conjugate gradient iteration from n 2 to n. 
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Multi-level integration 


In [6] , Brandt and Lubrecht describe a method based on multigrid techniques for 
approximately evaluating Kv which requires only 0(n) operations. The general idea 
is 

•kv « k h v h &n h H (n%v h ) . (12) 

Here h and n indicate the mesh spacing and number of nodes on the fine grid, and 
similarly, H and N indicate the coarse grid with N « n. n# and are coarse- to- 

fine and fine-to-coarse intergrid transfer operators, respectively. 

To evaluate K h v h cheaply, restrict v h to the coarse grid, apply the coarse grid 
operator k H at a cost of 0(N 2 ) operations, and then interpolate K H v H back to the 
fine grid. 

To see the details of this approximation, choose q th order transfer operators: n#, 
a coarse-to-fine mesh transfer (interpolation), and n^, a fine-to-coarse mesh transfer 
(restriction). Using p th order quadrature, the operation becomes 

[Kv](xf) = Jok(xf,y)v(y)dy, I = 1,...,N 
= h £” =1 k{xf, x^))v} + <D(h p ) 

= AE"=i[fc(arf + 0(A") + (13) 

= AEjti i?)[W)V]j + O(W) + O(H’) 


Then [Kv](xf) can be interpolated to the fine grid by with 0(H q ) accuracy. 
The entire application looks like 

K h v h = U h H K H (U h H ) T v h + 0(h? ) + 0(H q ). (14) 

If N 2 w n then H q ~ h p , provided q = 2 p, and this calculation requires only 0{n) 
operations and maintains 0(h p ) accuracy. To see this, let n = 2 lev ( lev > 0 is the 
number of levels, or nested grids) , let n + 1 be the number of points in the finest mesh 
with spacing h = and let the coarsest mesh have N + l points with spacing H = jj 
where N = 2 lev / 2 . With second order quadrature (p = 2), K H U^v h can be calculated 
in 0(N 2 ) = 0((2 lev / 2 ) 2 ) = 0(n) operations. Fourth order transfer operators ( q = 4) 
ensure that the accuracy of U l f I k H U^v h is 0{h 2 ) + 0(H 4 ) = 0(h 2 ). Note that 
= c(n^) T , with c = H/h\ hence, n^Kff(n^) T is symmetric. 

This provides an 0(n) method of applying K which maintains O (h 2 ) accuracy. 
Hence, an iteration of the conjugate gradient method applied to the system (10) will 
use only 0{ri) operations. However, K + aL(u) is not typically well-conditioned. 
The top right subplot of figure 4 depicts the eigenvalues of this operator for a fixed 
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■e- 
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gammaj';alpha=.0003;sigma-.1 ;cond(C(-1 )A)=1 .489 



Figure 3: Eigenvalues of the preconditioned matrix, C l / 2 AC x ! 2 where Lu = — V 2 w, 
C = bl + aL and b is the maximum eigenvalue of K. 

u , a , and j3. Note that these eigenvalues range over three orders of magnitude. 
Preconditioning must be used to improve convergence. 


Preconditioning 


To simplify notation, define 


,4 = I( + aUu) 


(15) 


For insight into the choice of a preconditioner, consider the 1-D case on 0 < x < 1 with 
L(u) replaced by the negative Laplacian and periodic boundary conditions, where K 
is a convolution operator, Ku = Jq 1 k(x — y)u(y)dy , with Gaussian convolution kernel, 
k(x ) = \[\e~ x2 l u2 . Then L has eigenvalues j 2 7r 2 which tend to oo, K has eigenvalues 
-e -0 ' 2 '? 2 / 2 which tend to 0, and L commutes with K. 

7 T 5 

This eigenvalue structure suggests a preconditioner of the form C = bl + aL. 
Then the iteration matrix becomes C -1 / 2 AC~ X I 2 = C~ l A with eigenvalues 


7j = - 


l e a2 i / 2 + an 2 ] 2 


b + aw 2 j 2 


j = 1 , 2 ,... 


(16) 
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KK, sigma=0.1 


A=KK+alpha*L(u), alpha=0.01 




C=bl+alpha*L(u), b=0.6334 newA = sqCinv*A*sqCinv 




Figure 4: Eigenvalues of the discretized operator matrices K = K*K, A, C , and 
C~ l/2 AC~ 1/2 where K is a convolution operator with kernel k(x) = (f)^ e - ® 2 / 0 ' 2 , 
C = bl + aL(u), and L{u) is the nonlinear operator as in FP. 

The jj tend to 1 as j — > oo independent of b. To ensure 7 j m 1 for small values of j, 
choose the largest eigenvalue of K for b, 

b = p(K) = -e- &2 ! 2 . (17) 

7T 

Figure 3 shows the eigenvalues of the iteration matrix C~~ l A, which result from this 
choice of b. Notice that cond{C~ l A) ~ 1 . This implies that the conjugate gradient 
method will converge very rapidly. 

With the more general diffusion operator defined in (9), this choice of b is yet 
reasonable as shown in Figure 4. Here, the eigenvalues of the matrices A, C, and 
C~ l l 2 AC~ l l 2 are shown. Although the eigenvalues are not all near one as in the 
constant diffusivity case, there is still clustering at one. The “stray” eigenvalues 
correspond to jump discontinuities in u. Thus, C = bl + aL(u ) is an effective pre- 
conditioner for this case as well. 
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A fast reconstruction algorithm 


The fixed point iteration and preconditioned conjugate gradient techniques de- 
scribed above can be combined to form an efficient reconstruction algorithm. What 
follows is the outline of such an algorithm for the two-dimensional deconvolution 
problem. This algorithm is used to obtain the numerical results presented in the 
following section. 

® Apply fixed point iteration as in (10). 

® To solve (K +aL{vS v ' i ))u( v+v> = K*z, apply a preconditioned conjugate gradient 
method with preconditioner C = bl + aL(u^) with b = p(K). 

® Within the preconditioned conjugate gradient method, use multi-level integra- 
tion for each application of K. 

• Within each iteration of the preconditioned conjugate gradient method, solve 
equations of form Cv — ( bl + aL)v = / by a preconditioned conjugate gradient 
method with the Ewing-Shen multigrid preconditioner [5]. 

Notice that C = bl + aL is essentially the same as the operator in a fixed point 
iteration of the denoising problem (11). The multi-level integration is 0(n) as shown 
above and in [6]. Therefore, the complexity of the preconditioned conjugate gradient 
method to solve ( K + aL{u^))u^ v+l ^ = K*z depends on the complexity of solving 
Cv = f. 


NUMERICAL RESULTS 


In Figures 5 and 6, the operator K has been taken to be a convolution integral 
operator with kernel, 

k{x) = e~ x2 ^ 2 (18) 

as in the Multi-level Integration section. Figure 5 presents convergence results for 
this 2-D example with noise-to-signal ratio = 1 and kernel-width parameter, a = 
0.075. Subplot A depicts the norms of the differences between successive iterates. 
Subplot B shows the norm of the gradient of T a as in (6). Subplot C plots the 
preconditioned conjugate gradient convergence factor for each fixed point iteration 
where the geometric mean convergence factor is calculated by 


convergence factor = exp 


1 ^ res m+1 , 

M ^ res m ■ 


(19) 
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2468 10 2468 10 


Fixed point iteration Fixed point iteration 



2468 10 12345 

Fixed point iteration PCG iteration 


Figure 5: Subplots A and B show the norms of the differences between iterates and the 
gradient of the function T a , respectively. Subplot C contains the convergence history 
of the preconditioned conjugate gradient method with preconditioner C = bl + aL 
at each fixed point iteration. Subplot D plots the residuals of the preconditioned 
conjugate gradient method for 5 iterations at the tenth fixed point iteration. 

where res m is the residual calculated at the (m — l) st preconditioned conjugate gra- 
dient iterate. Subplot D records the norms of the residuals at each preconditioned 
conjugate gradient iteration for the tenth fixed point iteration. Figure 6 shows the 
noisy data (with noise-to-signal ratio = 1), z = Ku exa£i + e and the subsequent re- 
construction obtained by the above algorithm. 

These results show that the described algorithm can be used to obtain recon- 
structions even for very noisy data. The convergence of the preconditioned conjugate 
gradient method is quite fast as evidenced by Figure 5, Subplots C and D. It is known 
that the multi-level integration method has 0(n ) complexity. Hence, the complexity 
of the preconditioned conjugate gradient method to solve (K +aL(u ( -^))u ( - l/+1 ' ) = K*z 
depends on the complexity of solving Cv = /, where C = bl + aL(u^)). This system 
is nearly identical to the one obtained in the discretization of the denoising problem, 
and for the results given here the same solver has been used, i.e., a preconditioned con- 
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A) Exact solution B) Kernel 





Figure 6: Subplot A shows the exact solution. Subplot B shows the kernel of the 
convolution operator. Subplots C and D show the data with added noise (noise-to- 
signal ratio = 1). Subplots E and F show the subsequent reconstruction with the 
algorithm described. 

jugate gradient with a cell-centered finite difference multigrid preconditioning step. 
This method appears to be nearly 0(n ) in complexity. 
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SUMMARY 


A multilevel algorithm is presented that solves general second order elliptic partial 
differential equations on adaptive sparse grids. The multilevel algorithm consists of 
several V-cycles. Suitable discretizations provide that the discrete equation system 
can be solved in an efficient way. Numerical experiments show a convergence rate of 
order 0(1) for the multilevel algorithm. 


1 Introduction 

In 1990, Bungartz and Zenger used hierarchical bilinear finite elements on a sparse 
grid to discretize Poisson’s equation on the unit square (see [1] and [2]). The discrete 
equation system was solved by a recursive algorithm. Balder extended this idea for 
the solution of the Helmholtz equation in the d-dimensional space (see [3]). 

In this paper, a multilevel algorithm is presented, that solves general second order 
elliptic partial differential equations on adaptive sparse grids. This multilevel algo- 
rithm consists of several V-cycles in one direction and of a Gauss-Seidel relaxation on 
each level. The restrictions of these V-cycles are a semicoarsening. Thus, the multi- 
level algorithm is similar to the multilevel algorithm in [4] and [5]. The Gauss-Seidel 
relaxation and the restriction and prolongation is made like the multilevel projection 
method in [6]. The multilevel cycle of the sparse grid multilevel algorithm is called 
Q-cycle. The problem of this Q-cycle is the calculation of the right hand side during 
the restriction. In case of general second order elliptic differential the exact stiffness 
matrix is so complicated that it is not possible to calculate the right hand side in 
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an efficient way. This means that one multilevel cycle costs more than 0(N 2 ) opera- 
tions, while 0(N log N) is the number of sparse grid points. Thus, it is necessary to 
approximate the bilinear form a corresponding to the elliptic equation. 

We studied two approximations of the bilinear form a. First, the variable coef- 
ficients in the bilinear form were replaced by a piecewise constant sparse grid inter- 
polant. Then, it is possible to calculate the right hand side in an efficient way. But 
even an additional simplification of the bilinear form a is possible. For Laplace’s equa- 
tion some hierarchical basis functions are orthogonal with respect to the bilinear form 
corresponding to Laplace’s equation. Therefore, it makes sense to replace the bilinear 
form a by a simplified bilinear form dh, which has similar orthogonality properties 
even in case of general elliptic differential equations. This gives the discretization 
with semi-orthogonality (see section 3). A convergence with order 0(N~ 1 log N) 
could be proved for this discretization of the Helmholtz equation (see [7]). Numerical 
experiments show the same behavior of convergence as for the original bilinear form 
even in case of more complicated elliptic differential equations. The advantage of the 
semi-orthogonality is that Q-cycle of the sparse grid multilevel algorithm becomes as 
simple as the V-cycle on full grids with bilinear finite elements. The reason for this is 
that nearly the same equations can be used for both multilevel cycles. On every level 
relaxations are made with a nine-point stencil. The restriction and the prolongation 
from one level to another one are made in the same way as in the case of full grids. 
For this it is only necessary to ignore the sparse grid points which are not contained in 
the actual level. This is allowed by the semi-orthogonality. All numerical experiments 
show a convergence rate with order 0(1) for the sparse grid multilevel algorithm. The 
multilevel algorithm requires only 0(N log N) operations per cycle. 

For simplicity, the discretizations and the algorithms in this paper are explained 
only for the regular sparse grids V n . However, it is possible to generalize the algo- 
rithms for adaptive sparse grids. The Q-cycle has been implemented for adaptive 
sparse grids and solves general second order elliptic differential equations. 

Throughout the paper, it is h = 2 -n , where n £ IN and =]0, 1[ 2 . 

2 Sparse Grids and Sparse Grid Interpolation 

The set of one dimensional grid points is 

V = | X) di • ^r|n £ 1N 0 and d Q = 1, di = -1, d 2 ,...,d n £ (1, -1} j U {0}. 

These points are illustrated in Figure 1. For every x £ 'P\{0}, there exist unique 
n £ IN 0 and d 0 = 1, d x = -1, d 2 ,...,d n £ {1,-1} such that x = £" =0 di ' ?• 
Therefore, we can define the depth of a point x £ V by 

T(0) = 0 T(x) = n for x £ V\{0}. 
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The regular sparse grids V n and V n are defined 
by 

V n := {(x,y) € V x V\ T(x) +T(y) < n + 1}, 
V n := X>„n]0,l[ 2 , . 

where n G IN. A more detailed description of 
general abstract and adaptive sparse grids and 
their properties is given in [8] and [9]. 



0.125 0.375 0.625 0.875 


Figure 1. Tree of possible grid points. 


Now, we will define the sparse grid interpolation with piecewise bilinear functions. 
For every x G V and k G IN , we define the piecewise linear function 



( 1 ) 


w k x : [0, 1] i-> R 


and for every (x, y) G V x V and k, l G IN the piecewise bilinear function 

V {L) : & ^ R 

vH y) {x',y') := w k x {x')-w l y {y'). 

The hierarchical basis functions of the point (x, y) G V x V is the function 

, — MM) 

V(x,y) •— v ( x ,y) 

There are two regular finite element spaces for the regular sparse grid V n 

V Vn := span^UsIz G V n } C D and 

V Vn := span R {u z |z G V n } c W\ (Q) (TC(fi). 

There is a unique sparse grid interpolation operator X Vn : C(Q) i->- Vv n such that 

Zv n (f)(z) = f(z) Mz G V n (see [2]). 

The sparse grid interpolation error with piecewise bilinear functions is now: 

Theorem 1 (Sparse Grid Interpolation Error). There exists a constant C > 0 such 
that the error in the W^-norm is for h = 2 -n 


ll/-2i>.(/)lk' < C\\f\\ w a.,h 

ll/-Z»,(/)lk < C\\f\\ w o,Mogh^ 
where 

H a -\Si) := {/ € L 2 ( fi) IH/IUo.i < oo} and 


for f 6 Wf' 4 (0), 
for f € Wf ' 3 (n), 


and 


H o,i 


( 

d i+j f 

\ 

{ 

dx i dyi 

Li' i+j<l, i,j <^+ 1 


h 
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The proof of Theorem 1 is given in [1] and [7]. At the end of this section, we 
define the following full grids and a full grid finite element space 

:= {(s, y) e V X 'PlTix) < k and T{y) < 1} and := ft w n]0, 1[ 2 

o , . o 

and V ’ := span R {v*|z G 

3 Discretization of Elliptic Equations 

We use the same notation as in [10]. Let / G and 

a : W, 1 ^) x IT^L!) • ^ R 
(a, v) i-> 

|a|,|/8|<l 

where a,/3 are multiindices and A = (a a ,/?)| a | G . Let us assume that 

o “ o 

a is continuous and W 2 1 (f2)-elliptic. We are looking for a solution u G W\ (fi) of the 
equation 

a(u,v) = f(v) for all v G (1) 

° ° 

The problem is now that we cannot replace by the finite element space Vv n 

and use the same bilinear form a. If we did so, we would get a manifold of stiffness 
matrices of dimension more than 0(2 n n) for this class of elliptic equations. Then, 
we would not be able to store the stiffness matrix in a sparse grid data structure. 
Therefore, we replace the bilinear form a by an approximate bilinear form. First, we 
replace a by 

a h : x W^{Q) R 

(u,v) x v n { a a,p){D a u){D^v) d(x,y), 

n H,|/J|<i 

where l% n (a a ^) is a suitable sparse grid interpolant. Second, we replace a h by a 
bilinear form a k with a semi-orthogonality property. For the definition of the semi- 
orthogonality property, we need the set of pairs of semi-orthogonal grid points (see 
Figure 2) 

o» := oj. u o;, 

where 

O l h := {{(x,y),{x\y'))eV n *V n \T{x)<T{x') and T(y)>T(y') and 
supp(u( a , ) y)) n supp(u (x /y)) D T* n = 0 and 
supp(t>(* iy) ) n supply)) ± 0}, 

Oil •= {{z,z') G V n x V n \(z',z) G O l h } and 
supp(u) := {z G Q\v{z) ^ 0} for v G C(0). 
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Observe that here the support supp of a function is not compact in general. 
Now we define the semi-orthogonality property of a bilinear form. 
Definition 1 (Semi-Orthogonality Property). 

o o 

A bilinear form b : Vv n x Vv n R has the semi-orthogonality property, if 

b(v z , v z >) = 0 for every (z, z ') G O h . 


(*»!/) 



O 

/ 


no grid points 

o 

(x 


Figure 2. Supports of Hierarchical Basis Functions of Semi-Orthogonal Grid Points. 


A simple calculation shows that the bilinear form (w, v) i-> f n (Vw, Vv)d( x, y) has 
the semi-orthogonality property. In case of general second order elliptic differential 
equations, we define the discrete bilinear form ah by its values on the hierarchical 
basis 


ah : Vv n x Vv n ^ R 


a h {v z ,v z ,) := 


0 for 

ah(v z ,v z ') for 


(z,z') E O h 
(z, z') O h - 


Obviously, a,h has the semi-orthogonality property. The discretization of equation (1) 
with semi-orthogonality is now: 


O 

Discretization with Semi- Orthogonality Find a u h e V Vn such that 


o 

dh(uh,v h ) = f(v h ) for all v h G V Vn . 


4 Multilevel Algorithm 

Let dh be the bilinear form ah or a h - We want to solve the following problem: 
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Discrete Equation System Find u h e V Vn such that 

o 

a h {uh,v h ) = f(y h ) Vv h G V Vn . (1) 

o 

Assume that u G Vu n is an approximate solution of the discrete equation system. 
Obviously, there are \ z G ft such that u = £ c x A 2 u 2 . For fixed k, l G IN, k + l < 
n + 1, we make the following decomposition: 

u = u k ’ 1 + u k J st , 

where 

u k ' 1 = A 2 u 2 G V k ’ 1 and u k J st = E \ z v z 

° o 

z ^^k,l Z = (x,y) e Vn A 

(T(x) > k V T(y) > l) 

For the construction of a multilevel algorithm, we have to push u k ^ st to the right hand 
side. Thus, we define 

f k,l {vh) := f{vh) ~ a h {u k J st ,v h ) for v h G V k ' 1 . (2) 

Now, we can define the 

O 

Equation System of Level (k, l ) Find u k ’ 1 g V k ' 1 such that 

h h {u k ' l ,v h ) = f k \v h ) Vv h G V k ’ 1 (3) 

Naturally, if u = u is the exact solution of the discrete equa^on system, then 
u k ' 1 is the solution of the equation system of level ( k , /). If u k ' 1 is the solution of the 
equation system of level (k,l) for every k + 1 < n + 1, then u = u is the exact solution 
of the discrete equation system. 

For relaxations, it is helpful to form the equation system of level ( k , l)m a matrix 
equation. Therefore, we define the vectors (u k A » , (F k ’ 1 ') » G R^'l and the 

matrix (A k {,) ° by 

V z ^ z Jz,z'en ktl 



= E U * ,l V k z' 1 

O 

(4) 


:= f k ’ l (v k ’ 1 ) and 

(5) 

/\k,l 

:= a h (v k /,v k ' 1 ). 

(6) 


The following matrix equation is equivalent to the equation system of level ( k , l): 
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Matrix Equation of Level (k, l ) 

Find (U k,l '\ ° G such that 

V 2 Jzen kt i 

j^k,ljjk,l _ pk,l 


( 7 ) 


Now, we want to construct a multilevel algorithm. The principal data to be stored 
are: 


• k, l : depth of the actual level. 2 k and 2 1 are the mesh sizes of the full grid 

O 

Q kt i corresponding to the actual level, 

• (£7«) : the actual approximate solution, 

Zxz l^n 

• (F z ) z£ £ : the right hand side of the actual level, and 

• (W z ) ^ : the one dimensional hierarchical surplus in the direction of the last 
restriction. 


First, we have to define a relaxation step in the level ( k,l ). Let 

~k,l . ~k,l 

'U'old *A old ' ^old,rest 

be the decomposition of the actual approximate solution. Assume U z = u k f d (z) for 

O 

all 2 G Qk,i- 


Procedure: Relaxation 

Choose (U z ) ze Q k ( for the start solution of (7). Make a standard relaxation 
step (e.g. Gauss- Seidel-relaxation) of equation (7). This gives the new 
approximate solution (U z ) ze £ lki on the level ( k,l ). 


After one relaxation step, we define u k { w (z) := U z for all 2 G Q kd . u k j w G V k ' 1 
is the new approximate solution on the level ( k , l). The new approximate solution is 
now 


U'new • V'new ^ 


~k,l 

old, rest' 


o 

But after one relaxation, we only have u new (z) = U z for all 2 G Q, k ,i- For the 
propagation of u new to other grid points, we need the procedures restriction and 
prolongation. The procedure prolongation calculates u n ew on the new level. 


Procedure: Restriction in x-direction 

O 

For ( x,y ) G with T(x) = k do 

^F(x, 2/) • 0.5 * {U(x+‘l~ k ,y) T U( x —2~ k ,i /))> 


667 



Procedure: Prolongation in x-direction 

O 

For (x,y) £ Qk,i with T(x) = k do 

U( x ,y) ■= 11 (x,y) + 0.5 * (U( x+ 2 -k >y ) + U( x _ 2 -k 

The procedures Restriction in y-direction and Prolongation in y-direction are de- 
fined analogous. 

The procedures Restriction and Prolongation calculate 

O 

Uz '■= U new (z ) for 2 £ Qk new ,l new 1 

where (k new ,l new ) is the new level. The procedures Restriction and Prolongation can 
do this only if the multilevel algorithm satisfies the following rule: 

Restriction-Prolongation- Rule 

Assume that Restriction in x-direction was used from the level (A/, l') to 
the level (k 1 — 1, l'). Then use Prolongation in x-direction with k = k' — 1 
next time only if 1 = 1'. 

Assume that Restriction in y-direction was used from the level ( k l') to 
the level (k',1 1 — 1). Then use Prolongation in y-direction with l = l 1 — 1 
next time only if k = k' . 

Last, we need the procedure 

Procedure: Calculation of the right hand side 

This procedure calculates F z := F^’ 1 for all grid points z £ 0,^,1 ■ 


AND 


Figure 3: Q-Cycle of the multilevel algorithm on a sparse grid 
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Now we can explain the Q- cycle (see Figure 3): 


THE Q-CYCLE { 

Step 1: Way in x-direction 
LET k := 1; 

WHILE k < n { , 

Step 1.1: V-cycle in one direction 
LET l := n- k + 1; 

WHILE l> 1 { 

Restriction in y-direction; AND l := l — 1; 
Calculate the right hand side ; 

} 

Relaxation ; 

WHILE l < n - k + 1 { 

l := l + 1 AND Prolongation in y-direction; 
Calculate the right hand side; 

Relaxation; 

} 

Step 1.2: Changing k 

Restriction in y-direction; AND l := n — k; 
Calculate the right hand side; 
k := k + 1; AND Prolongation in x-direction; 
Calculate the right hand side; 

Relaxation; 

} 

Step 2: Way in y-direction 
analogously 


Observe that this cycle satisfies the Restriction- Prolongation- Rule. 


The Q-cycle can be implemented in an efficient way. This means that the number 
of operations of one Q-cycle is proportional to the number of grid points. Observe 
that it is enough to find an implementation such that the number of operations of 
every procedure on the actual level is proportional to the number of grid points of the 
actual level. Except for the procedure Calculation of the right hand side, it is simple 
to see how to do this. 

In case of the discretization with semi-orthogonality, the Calculation of the right 
hand side is similar to the full grid case. 

O 

Let us assume that at the beginning of the Q-cycle it is for ( x , y) € V n 


zp jpT(x),n—T(x)+l 

r (*,y) — P (x,y) 


( 8 ) 
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Now, we do the Calculation of the right hand side in the multilevel cycle in the 
following way. After a restriction in x-direction we use the equation 


jpkJ.~\ rpk,l , _ ( i Tpk,l \ ~ ( ~k,l ~kj,—l 

F(x,y) M x,y ) + 2 ^(x,y+ 2~ 1 )) a h \ u u i v ( x ,y) ) 

in the Calculation of the right hand side. After a prolongation in x-direction we use 
the equation 

pk,l _ Tpk,l - 1 _ \ ( pk,l , pk,l \ I pj, (f.k,l _ f.k,l- 1 k,l-l\ 

^(x,y) x,y ) 2\( x ’y- 2 ~ l ) + ^( x ’y+ 2 ~ l )) +ah \ U ,V (x,y) ) ' 

Similar equations must be used after the restriction and prolongation in y-direction. 
At the end of one Q-cycle equation (8) is correct again. 


5 Numerical Results 


Numerical Example 1 (Spectral Radius of the Q-cycle) 

Let e > 0. Then, the bilinear form 

Vv d(x, y) 

is TT 2 1 (r2)-elliptic. We are interested in the spectral radius of the Q-cycle iteration 
matrix on the regular sparse grid V n . Table 1 shows the approximate spectral radius. 
It is very small independent of n and e. 



€ 

0.001 

0.01 

0.1 

1 

10 

100 

1000 

n = 3 

0.1 

0.03 

0.02 

0.01 

0.005 

0.02 

0.1 

n — 4 

0.08 

0.002 

0.01 

0.002 

0.002 

0.01 

0.06 

n = 5 

0.01 

0.02 

0.005 

0.002 

0.005 

0.01 

0.01 

n = 6 

0.01 

0.01 

0.005 

0.002 

0.01 

0.005 

0.01 

n = 7 

0.01 

0.01 

0.003 

0.01 

0.01 

0.01 

0.01 

n = 8 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 


Table 1: Approximate spectral radius of the Q-cycle 


Numerical Example 2 (Convergence of the discretization with semi-orthogonality) 
Let us look to the domain 

T = |(z, y) e ]0, 1[ 2 |0 < x < 1 and 0.5 • (1 + sin(7r • x)) >y> x • 0.25 j . 
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n 

3 

4 

5 

6 

7 

8 

9 

2 {Dn 

1.5e-3 

5.6e-4 

1.9e-4 

5.8e-5 

1.8e-5 

5.3e-6 

1.6e-6 

li— ||2,T>„_i 

2.0 

2.8 

3.0 

3.2 

3.3 

3.4 

3.4 

IHl2.D„ 

— 0O,V n 

5.4e-3 

1.9e-3 

6.0e-4 

1.9e-4 

5.9e-5 

1.8e-5 

5.2e-6 

|| — ||oo,X> n _ 1 

IHU.Pn 

2.0 

2.9 

3.1 

3.2 

3.2 

3.3 

3.4 


Table 2: Convergence of the discretization with semi-orthogonality and rj = 1 


The function 

u(x,y ) = (1.0— exp(x/r]))-(1.0— exp{y/rj)) 

is the solution of the equation u G Wf{^) 
and 

a(u, v) = 0 for all v G W-J (\H) (1) 

with Dirichlet boundary conditions. Let us 
map the domain 4/ by a smooth mapping 
onto the unit square. 

This gives a transformed elliptic equation of equation (1) on the unit square. Now, 
we can solve this equation by the discretization with semi-orthogonality. Thus, we 
get a discrete solution Uh of the equation (1). Figure 4 shows an adaptive sparse grid 
with 1220 grid points. There are more points on the left side of the domain, because 
u is not very smooth for small x. 



Figure 4. Adaptive sparse grid on the domain 
$ for i) = 0.1 


z ^' D n 


Mz)l 2 


We use the following discrete norms ||w|l°°,'£ , n := max zeT> n \ w ( z )\ and |M| 2 ,x> n := 
-. Table 2 leads to the conjecture that Uh converges to u with the order 

M|oo,©„ = 0(h 2 log hr 1 ) and\\w\\ 2 ,v n = 0(h 2 \ogh~ l ). 


I®»l 
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ERROR AND COMPLEXITY ANALYSIS FOR A 
COLLOCATION-GRID-PROJECTION PLUS PRECORRECTED-FFT 
ALGORITHM FOR SOLVING POTENTIAL INTEGRAL EQUATIONS WITH 
LAPLACE OR HELMHOLTZ KERNELS 

J. R. Phillips* 

Dept, of Electrical Engineering and Computer Science 
Massachusetts Institute of Technology 
Cambridge, MA 02139. 

SUMMARY 

In this paper we derive error bounds for a collocation-grid-projection scheme tuned 
for use in multilevel methods for solving boundary-element discretizations of potential 
integral equations. The grid-projection scheme is then combined with a precorrected- 
FFT style multilevel method for solving potential integral equations with ^ and e lkr /r 
kernels. A complexity analysis of this combined method is given to show that for 
homogeneous problems, the method is order nlogn nearly independent of the kernel. 
In addition, it is shown analytically and experimentally that for an inhomogeneity 
generated by a very finely discretized surface, the combined method slows to order 
n 4 / 3 . Finally, examples are given to show that the collocation-based grid-projection 
plus precorrected-FFT scheme is competitive with fast-multipole algorithms when 
considering realistic problems and 1/r kernels, but can be used over a range of spatial 
frequencies with only a small performance penalty. 

1. INTRODUCTION 


In the last several years, there has been a significant increase in the volume of 
research on discretized integral equation, or boundary-element, solvers[l]. Boundary- 
element methods have always been an appealing approach for solving exterior 
problems, because such methods only discretize domain boundaries and not exterior 
vol um es. The main difficulty with boundary-element methods is that they generate 
dense matrices which were expensive to solve. What has generated renewed interest 
in boundary-element methods is that the combination of iterative solvers, such as 
Krylov-subspace methods, and matrix sparsification techniques, like fast-multipole 
and multilevel methods, have been used to create very fast boundary-element 
codes [2, 3, 4]. 

Fast-multipole based codes for solving potential problems with £ kernels are 
now commonly used in a variety of engineering applications [5]. What is now of 
primary research interest is developing sparsification procedures for boundary-element 
matrices which are capable of solving potential problems with relatively general 
kernels, at least including \ and — for a wide range of kr [6, 7, 8, 9, 3, 10]. Such a 
direction parallels the recent work on using multigrid methods to solve the Helmholtz 
equation [11, 12]. 

*This work was supported by ARPA contracts N00174-93-C-0035 and J-FBI-92-196 as well 
as grants from the Consortium for Superconducting Electronics, the Semiconductor Research 
Corporation (SJ-558), IBM and Digital Equipment Corporation, and an NDSEG fellowship. 
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In this paper we analyze errors and complexity for a general collocation-grid- 
projection scheme for use in a precorrected-FFT style algorithm for solving integral 
equations with general kernels. In the next section, we briefly review the boundary- 
element method for solving potential integral equations and give a brief description 
of the precorrected-FFT approach. In Section 3, which contains the main theoretical 
results of this paper, we give rigorous error bounds for a collocation-based grid- 
projection scheme. In Section 4, we address the issues of algorithm computational 
complexity, and analyze the homogeneous case as well as one type of inhomogeneity. 
In Section 5, we give some experimental results to show that the collocation-based 
grid-projection plus precorrected-FFT scheme is competitive with fast-multipole 
algorithms when considering realistic problems and 1/r kernels, but can be used 
over a range of spatial frequencies with only a small performance penalty. 

2. PROBLEM FORMULATION AND THE PRECORRECTED-FFT ALGORITHM 


Laplace or Helmholtz problems, with a combination of Neumann or Dirichlet 
boundary conditions, can be cast into an integral equation form using monopole, 
dipole or combined-layer potentials [13]. In the combined-layer case, the potential is 
represented by 


ipix) = [ {G n (x,x') — irjG{x, x')}cr(x')da' , x G S (1) 

Js 

where x, x' G 9? 3 , S is a multiply-connected two dimensional surface in 5R 3 , G(x, x') = 
e ik\\x-x'\\ / 471 " j j x — x'\\ is the Green’s function for the Laplace (k — 0) or Helmholtz 
equation, G n is the surface normal derivative of G at x', cr(x') is the combined-layer 
density often referred to as a charge density, and 7? is a complex scalar which depends 
on k. 

For each point x for which u(x) is specified, the charge density satifies 

— + [ G n (x, x')a{x')da! — irj [ G(x, x')a{x')da! = u(x) (2) 

2 Js Js 

and for each point x where u n (x) is specified the charge density satisfies 

dn( x) Is Gn> x '} a ( x '} da ' + ~ dn(x) Is G ^ X ' x '^ a ^ da ' = u ^ x )- ( 3 ) 


Boundary-Element Discretization 


Boundary-element methods are commonly used to solve potential integral equa- 
tions like (2) and (3), but are easiest to describe when considering the simple first-kind 
integral equation of the form 


ij)(x) = / a(x')G(x, x')da', 
Js 


xeS. 


(4) 
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To compute an approximation to a, the boundary-element approach is to consider an 
expansion of the form 

n 

(5) 

Z=1 

where h^x),..., h n (x) : 9? 3 — > SR are a set of compactly supported expansion functions, 
and qi,...,q n are the unknown expansion coefficients. The expansion coefficients are 
then determined by requiring that they satisfy a Galerkin condition of the form 


Pq = p, 


( 6 ) 


where P 6 9? nXn is given by 

Pij = J hi{x ) J hj(x')G(x,x')da! da. 


(7) 


The approach used in many engineering applications is to approximate the surface S 
with N planar quadrilateral and/or triangular panels, in which case the support for 
hi is just a single panel. 


The precorrected-FFT technique 


If Gaussian elimination is used to solve (6), 0(n 3 ) operations and 0(n 2 ) storage 
are required. Typical engineering problems may have thousands or tens of thousands 
of panels, so that Gaussian elimination is not a feasible approach. In [14, 15] it was 
shown that the precorrected-FFT method described below is an efficient approach 
to solving (6), reducing the number of operations and memory required to nearly 
0{n\ogn). As can be seen from Fig. 1, for solution of Laplace’s in typical engineering 
geometries, the precorrected-FFT method is superior to fast multipole algorithms in 
terms of computation time and memory requirements. 

Consider solving (6) by using a Krylov-subspace technique such as GMRES [16]. 
The dominant costs of such an algorithm are in calculating the n 2 entries of P using 
(7) before the iterations begin, and performing n 2 operations to compute the dense 
matrix- vector product on each iteration. To develop a faster approach to computing 
the matrix-vector product, after discretizing the problem into n panels, consider 
subdividing the problem domain into an array of small cubes so that each small 
cube contains only a few panels. Several sparsification techniques for P are based on 
the idea of directly computing only those portions of Pq associated with interactions 
between panels in neighboring cubes. The rest of Pq is then somehow approximated 
to accelerate the computation [2] . 

One approach to computing distant interactions is to exploit the fact that 
evaluation points distant from a cube can be accurately computed by representing 
the given cube’s charge distribution using a small number of weighted point charges 
[17]. Pq can then be approximated in four steps: (1) project the panel charges onto 
a uniform grid of point charges, (2) compute the grid potentials due to grid charges, 
(3) interpolate the grid potentials onto the panels, and (4) directly compute nearby 
interactions. This process is summarized in Figure 2. 
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Example 

Speed 

Memory 

micromotor 

0.68 

0.81 

cube 

0.73 

0.31 

woven bus 

0.63 

0.42 

bus crossing 

0.43 

0.26 

via 

1.42 

0.37 

DRAM cell 

0.80 

0.73 


Figure 1: Comparison of performance 

of FFT-based to multipole-based codes 
for 1/r Green function. “Speed” is 
ratio of matrix-vector product time of 
precorrected-FFT method to fast multi- 
pole based method, “memory” the ratio 
of required storage. 
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Figure 2: 2-D Pictorial representation of 
the four steps of the precorrected-FFT al- 
gorithm. Interactions with nearby pan- 
els (in the grey area) are computed di- 
rectly, interactions between distant panels 
are computed using the grid. 


There are several possible approaches to computing the grid charge. Analysis 
of one possible scheme is presented in Section 3. When the grid charges have been 
determined, their potentials at the grid points must be computed. The potential i^(x) 
at a point x = ( x , y, z) is the sum of the potentials from all the grid charges q(x'), 

^(*) = IX®, *')?(*') . (8) 

x' 

The free-space Green function g{x , x') = g(x — x\ y — y', z — z') depends only on the 
relative difference between the points x and x' . Therefore, because of the regular grid, 
the computation of the grid-charge potentials at the grid points is a three-dimensional 
discrete convolution. This convolution can be rapidly computed by using the Fast 
Fourier Transform[18], requiring O(NlogN) operations. Once the grid potentials 
have been computed, they must be interpolated to the panels. 

In the computation of panel potentials due to grid charges, the portions of Pq 
associated with neighboring cube interactions have already been computed, though 
this close interaction has been poorly approximated in the projection/interpolation. 
Before computing a better approximation, it is necessary to remove the contribution 
of the inaccurate approximation. It is possible to construct a “precorrected” direct 
interaction operator, P^ r , which consists of the direct interaction operator P a ^ for 
neighboring cells a and b, with the errors introduced by the grid-charges exactly 
subtracted out. When used in conjunction with the grid charge representation, 
P%£ results in exact calculation of the interactions between panels which are close. 
Assuming that the Pq product will be computed many times in the inner loop of an 
iterative algorithm, P%£ will be expensive to initially compute, but will cost no more 
to subsequently apply than P a 

3. GRID-PROJECTION SCHEME 

In this section, we describe and analyze accurate operators for projecting charge 
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densities onto the grid and for interpolating potentials from the grid, the two problems 
being equivalent as noted in [3]. 


The Collocation Grid-projection and Interpolation Operators 


Consider approximating the potential of a charge distribution p{x) by a set of 
N g point charges, Qj,j = 1 . . . N G which are positioned at points Xj. Suppose also 
that both the point charges and the charge distribution lie entirely inside a sphere of 
radius a centered at the origin. We will require that the potential of the point charges 
and the potential of the true charge density match at a set of N < N G collocation 
points x c> k, k = 1 ... N on a closed surface which encompasses the sphere of radius a. 
That is, for each k , 

Y,QjG(xj,x c>k ) = J p{x')G{x' ,x C! k)dx' 

where G(x, x') is the relevant Green’s function. It will be convenient if the surface 
is chosen to be a sphere of radius r c > a, and the collocation points are chosen to 
be the abscissas of a quadrature rule on the sphere. Integration rules of arbitrary 
order on a sphere can be constructed by product techniques, but more efficient non- 
product rules exist [19] which will generally be sufficient for our purposes. By careful 
selection of the quadrature rule, at least for the orders we have checked, it is possible 
to insure the grid charge does not substantially exceed the net cube charge. That is, 
for appropriately selected quadrature rules, 

52\Qj\ < K ! \p{x')\dx' (9) 

j 

where k is a constant independent of order. 

In addition to constructing operators that represent panel charges by grid charges, 
it is necessary to construct operators, of comparable accuracy, that interpolate 
potentials at the grid points to the charge panels. 

Lemma 1. If W is an operator which projects charge onto a grid, W T is an 
operator which interpolates potential at grid points onto charge coordinates, and W 
and W T have comparable accuracy. 

Proof. Suppose that a unit charge at the point x 0 is represented by the vector of 
grid charges q g . The approximate potential T(y) at a point y is given by 

$(y) = Y,9(xi,y)q g = g T q g 

% 

where Xi is the position of the ith charge, and g{xi, y ) the Green function. Conversely, 
suppose there is a unit charge at y, and the potential at x 0 is to be computed. Then, 
if V is the interpolation operator, 

*(®o) = '52 V ( x 0’ x i)9( x iiV) = V 9 
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For a symmetric Green function, ^(xo) = g(xo,y ) = g{y,x 0 ) = ^(y), so that 

&(x 0 ) - ^(® 0 ) = Vg- tf(x 0 ) = ( g T V T ) T - t(j/) = #(y) -V(y) = (g T q g ) - V(y) 

if we require V = qj. In other words, if W is an operator which represents a charge 
at point xq by grid charges, W T interpolates potential at the grid points onto the 
point £o, and W and W T have the same order of accuracy. □ 


Error analysis 


First we establish error bounds for the approximation of a panel charge potential by 
grid charges. 

Lemma 2. Suppose a grid-charge representation of a charge distribution p(x), 
of total charge Q — f \p{x')\dx' , lying inside a sphere S(a ) of radius a centered at 
the origin, has been constructed. Assume the grid charges Qj are given at points 
Xj, j = 1 . . . Nq, and define Q g = J2f=i \Qj\- The error <f e in the grid-charge 
approximation of the potential in the k = 0 case satisfies 


\fa\ < 


Q + Qg / Q \M+i (M -f l) 2 + 1 
Tm ^m, 1 ifl/l’m) 


( 10 ) 


where M is the order of the quadrature rule and r m is the distance of the nearest 
potential- evaluation evaluation point to the origin, r m > a. 

Proof. The multipole expansion of potential 4> of the charge distribution is [20] 

1 


OO l 1 

(l>(r, 9, fa = 4 tt £ Y, 


1=0 m——l 


21 + 1 r l+1 Js( a) 


[f (11) 

JS( a) 


Similarly, the multipole expansion of the grid-charge potential (f) g (r , 9, fa is 

oo 2 i i Na 

+ 


°° ^ 1 1 Ng 

Mr, 0, <t>) = ^ E E QS+M, M ( 12 ) 

1=0 m=—l M L T j = 1 


Let ( r c , 9k, (f>k) denote the kth collocation point, k = 1 ... N, on the surface of the 
sphere of radius r c . Assume that the (9 k , fa) are the abscissas of a quadrature rule 
on a sphere such that the rule exactly integrates spherical polynomials of degree at 
least 2M. Let Wk,k = 1 ... N denote the quadrature rule weights corresponding to a 
sphere of radius unity. 

At a collocation point, the error in the potential, fa(r c ,9k,fa) = far c , 9k, fa) — 
fa(r c , 9k, fa) is zero, so we may write 


M l 

E E 

2=0 m=—l ' c 
oo 


r-2+1 


Ql,mfalm(@k, 'fk) — 


1 1 
47 T T T 


1 


2=M+1 m=—l 


21 + 1 ri +1 


. N a 

r'‘ p(x')Y{ m (ff . 4>')dx' - ■£ Qj+UOj, M 

JSw j — i 


(13) 

Ylm{.9k, 4>k) 
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for k = 1 . . . N, where q^ m is given by 


Ql,m 


47 r 

2Z + 1 



Multiplying each side of (13) by WkYy m ,{6k,(j)k) and summing over k leads to a 
simplified form. From the identity 

and the quadrature rule for selecting Wfc, it then follows that 

E WkYvm'iPk, (/>k)Yim(O k , <Pk) = 8 iv 8mm! 

k 

for l + l' <= 2 M. Therefore, 


1 

rV+ 1 Ql',m' 


(14) 


N o° i i i r t 

4 *1t w k E E \ S (/^ x ' )Y ^ e '^' )dx '~ 

k = 1 Z=M+1 m=-i "T - 1 r c l JS ( a ) 

Yl*m'(Qk, (j>k)Ylm{0k, <Pk) 


N g 

EQrfYZnVj’tj) 

3=1 


The addition theorem for spherical harmonics states 


47 r 


21 + 


Y E = a(c os 7 ) 


m=—l 


where 7 is the angle between (0', 0') and (0, <£) and Pi { cos 7 ) a Legendre polynomial. 
The addition theorem provides a bound 


47T 

21 + 1 


1 


e 


< 1 


since |p(cos y)| < 1 , as well as a bound on the magnitudes of the spherical harmonics, 


121' + 1 


47T 


Since ^ 

•/5(a) m= _ / 21 + 1 

and using the additional fact J2 k w k — 47 t, we can bound the sum of the infinite series 
on the right-hand side of (14) to obtain a bound on the (/', m!) multipole coefficient 
of the error 
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(15) 




oo 

,, m .\<rl(Q + Q a )(i«)<JeX' + l)lit £ (-)' 


or 


\Qi 


<,™-l < ri(Q + Qa)( 4+^/(2l' + l)/47r(f ) 


i=M+l 
M+l 1 

7y 1 - (a/r c ) 


= Qv- 


(16) 


Using the multipole expansion truncation bound in [2] a bound can be derived for 
the error in the potential, \(j) e \, 


M 

W < £ An/( 2i ' + 1)/^ + 


Q + Qg ^y{+i 


1 


/=()■ ■ i-WO' 

After substituting the expression for qi from (16), (17) becomes 

m „ d \ n ~ „ i 


(17) 


a 


0,1 < ;(Q + Qc)(fr» (a,r) 


• ( 18 ) 


Depending on the relative size of r c and r, we may obtain two bounds on the 
magnitude of the error, 


< 


and 


I0e| < 


Q + Qg 


Q + Qg 


M+l (M + l) 2 + ^ M+ i 




M+l 


l-(a/r c ) r 1 - (a/r)J 


(M + l) 2 i 1 

l- (a/r c ) r 1 - (a/r)J 


r c <r 


ry > r. 


(19) 


(20) 


In the potential evaluation process, the worst-case error will occur at the point of 
smallest r. If we require that r c > r m , the lemma is proved. □ 


We now have the main result of the paper. 

Theorem 1. Suppose the potential of a point charge is given by 1/r. The grid- 
based technique for evaluating, outside a sphere of radius r m , the potential of a charge 
density of total charge magnitude Q, located inside a sphere of radius a, has error 4> e 
bounded by 

n „ ( A/T _i_ i w _i_ 1 

( 21 ) 


w < (!+*)%?_)«+■(" ± 21+1 


T m. T' n 


1 (u/r TO ) 

where 2 M is the order of a quadrature rule on a sphere. 


Proof. The theorem follows directly from Lemmas 1 and 2. 


□ 


Helmholtz Kernels 

Suppose that outside a sphere of radius a, a function if satisfying the Helmholtz 
equation is represented by a multipole expansion whose moments up to order N vanish 
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(22) 


OO l 

^{r,0,4>)= J2 J2 Pi,mh?\kr)Yim(B,<l>) 

l=N + 1 m=-l 

where k is the wavenumber and h\^ = ji(kr)+iyi(kr ) is the first-kind Hankel function 
of order l. 

For such a potential the following lemma exists [7]: 

Lemma 3. For N > ka and any r > a, there exists a c > 0 such that 

W(r,M)l<<^)" +1 (23) 


Theorem 2. Suppose the potential of a charge is given by e lkr fr. If the collocation 
points in the grid- charge assignment are chosen to be the abscissas of a quadrature 
rule which exactly integrates spherical harmonics of order < 2 ka, i.e., 


M > ka 


(24) 


for a quadrature rule of order 2 M, then the grid-based technique for evaluating, outside 
a sphere of radius r m , the potential of a charge density of total charge magnitude Q, 
located inside a sphere of radius a, has error <f e bounded by 


Q i a \M+ 1 


\(fe\ < c(l + «) — ( — ) 

' m, i ‘ 


(M + l) 2 + l 

1 - (a/r m ) 


(25) 


Proof. Given the conditions of Lemma 3, the proof follows exactly as for Theorem 

1. 


□ 


Applications and competing approaches 


While the grid operators described here were developed with the precorrected- 
FFT technique in mind, they can be incorporated into any multi-level scheme [10, 3]. 
The representation described here has two advantages which allow it to be efficient. 
First, because of the regular spacing of the grid charges, fast ( O (l 2 ) log l , where l 
is the order of the quadrature rule) translation and potential evaluation operators 
exist. It appears that in the approach in [10], only the 0(1 4 ) direct operators are 
available. Secondly, the sharing of grid charges between computational cells allows 
for a reduction in the total number of coefficients needed to represent the potential 
in each cell of the computational domain. That is, if there are N cells in the domain, 
and p 3 grid charges are used to represent the potential in each cell, then, for large N 
where we may neglect edge effects, the total number of grid charges is only N(p — l) 3 , 
a significant reduction for small p. For most engineering problems, we expect p < 5, 
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so the sharing effect will still be significant. An additional advantage of the grid- 
based approach is that the potential throughout the domain can be obtained at little 
additional cost once the panel charges have been determined [21]. 

4. COMPLEXITY ANALYSIS 


We first consider the case where the panel charges are evenly distributed 
throughout space. 

Theorem 3. For a homogeneous distribution of N panels, the precorrected-FFT 
method requires 0(N log N) operations to perform a potential calculation. 

Proof. Assume space has been divided into an array of M x M x M cells, and 
that there are about N = n 3 panels evenly distributed throughout the M x M x M 
cube, so that there are about ( n/M ) 3 panels in each computational cell. Finally, 
assume that the grid in each cell is a p x p x p array. There are three components 
in the cost of the precorrected-FFT method. We assume that any costs associated 
with forming the grid projection operators are negligible, since these calculations only 
need be performed once, not at each GMRES iteration. 


• Cost of direct interactions 


C. 


( ^ \ 6 n /fS ^ 

n = a(-)M = c— 


« Cost of grid projection and interpolation 

CV = 7M 3 (^)V = 7 n 3 p 3 

which is independent of M. 

• Cost of the FFT 

Cf — fip 3 M 3 log 2 Mp 


If we assume that M is proportional to n, then the total cost of the algorithm is 
0(n 3 + n 3 log 2 n) = 0(N log 2 N). □ 


For the boundary-integral methods considered in this paper, however, the panels 
are usually not homogeneously distributed. 

Theorem f. For a single closed surface at fixed k the precorrected-FFT method 
requires 0(N e/5 log N) operations to perform a potential calculation, where N is the 
number of panels. 

Proof. Again assume space has been divided into an array ofMxMxM cells, and 
that the surface measures about n panels wide along each side of the MxMxM cube, 
so that there are about N ~ n 2 panels total, and (n/M) 2 panels in each computational 
cell which is occupied. About M 2 cells will have panels. To determine the complexity 
of the method, the optimal number of cells M must be determined as a function of 
problem size, n. The analysis proceeds as above: 
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® Cost of direct interactions 


n ’ 


c D =d-r 


® Cost of grid projection and interpolation 

Ci = jM 2 (^) 2 p 3 = jn 2 p 3 

which is independent of M. 

® Cost of the FFT 

C' F = pp 3 M 3 log 2 Mp 


Neglecting for the purposes of optimization the logarithmic factor, the total cost 

rc 4 

Cd = a J^2 + + Pp 3 M 3 

which when optimized with respect to M gives 


so that 


M = 


2an 4 \ 1/5 

3 (3p 3 ) 


n 4 / 5 


C D oc n 12 / 5 = 0(N 6 / 5 ) 

Ci ocn 2 = O(N) 

Cp oc n 12,/5 log 2 np = 0(N 6 ^ 5 log 2 Np) 


□ 


In this analysis, we have assumed that p is constant. For a given problem, when 
solving the Helmholtz discretization as the frequency increases, generally the number 
of panels must increase to retain a fixed number of panels per wavelength. However, 
the size of a computational cell decreases proportional to 1 /M, or as nr 4 ' 5 , slower than 
n. Thus, for high frequencies the criterion in (24) that the order of the quadrature 
rule be greater than 2k A will be violated. We must allow p to vary with n to obtain 
the correct complexity analysis, which gives a different complexity bound. 

Theorem 5. For a single closed surface the precorrected-FFT method with y/N 
proportional to k requires at most 0(N^ 3 logN) operations to perform a potential 
calculation, where N is the number of panels. 

Proof. Assume the size R of the computational domain is fixed. Further, assume 
a fixed number of panels per wavelength, n ~ 1/A ~ k is required to maintain the 
solution accuracy. Then kA = kR/M ~ n/M. The number of collocation points 
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necessary for order l quadrature is Oil 2 ), which is of the same order as the number 
of grid charges per cell, p 3 . Thus we have 


p ~ 



Repeating the above complexity analysis, we have 


• Direct cost Cp = 0(n 4 /M 2 ) 

• Interpolation cost Ci = 0{p 3 n 2 ) = 0(n 4 /M 2 ), same order as the direct cost 

• FFT cost C F = 0{M 3 p 3 log 2 Mp) = 0(Mn 2 ) 

The total cost is thus C F = 0(Mn 2 + n 4 /M 2 ) which when optimized for M gives 

M = 0{n 2/3 ) 


The asymptotic cost of the entire algorithm is then 0{N 4 / 3 log 2 IV), a slight 
increase over the 0(IV 6 / 5 ) in the case of Poisson’s equation, and competitive with 
two-level multipole based schemes for the Helmholtz equation [7] . 

We should also note that the cost of forming the grid projection operators, 
0(p 9 ) = 0(n 2 ) = O(N) remains reasonable. □ 


5. COMPUTATIONAL RESULTS 


Empirical Grid Error Analysis 

In Figures 3(a) and 3(b), the errors in the potential due to the grid charge 
approximation are shown for two values of the collocation sphere radius r c , in the 
Laplace (k = 0) limit. In Figure 3(a), with r c small, for all orders of approximation 
the error decays slowly away from the charge distribution. Since in this case r c ~ r min , 
we expect the error to behave essentially as a monopole, dying slowly away from 
the origin, regardless of the order of the quadrature rule. We only expect the 
order of quadrature rule to change the constant factor in front of the error term. 
Notice in Figure 3(b), where r c is considerably larger, the worst-case errors have not 
changed much, as predicted by our previous analysis. The variation of error with 
distance, however, changes drastically. As the collocation sphere radius is increased, 
the magnitude of the low order multipole coefficients of the error decreases, and the 
errors decay rapidly with distance. Note that the sharp error decay associated with 
high order multipole approximation ends at about the collocation sphere radius. 

In Figures 3(c) and 3(d), we consider errors in the Helmholtz equation. At low 
k , all three order schemes considered still exhibit acceptable error properties (if an 
acceptable worst-case error is of order HR 4 — 1CU 3 ). As k is increased, however, the 
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(b )r c = 6d, k = 0 



Figure 3: Error in grid approximation of potential of 100 charges of random strength 
Q € [0, 1] located at random positions inside a cube of side length 2d centered at 
the origin. Collocation sphere radius is r c = 1.5 d (left figure), r c = 6 d (right figure). 
Solid line: p = 3, order 7 quadrature rule. Dash line: p = 4, order 11 quadrature 
rule. Dash-dotted fine: p = 5, order 14 quadrature rule. 


low-order schemes become inaccurate, and the high-order scheme (p = 5) becomes less 
accurate, though still retains acceptable accuracy for this relatively high frequency 
(at this freqency, the basic computational cell is more than a wavelength long). 


Computational Examples 


First we analyze the behavior of the precorrected-FFT method as a function 
of problem size, for Laplace and Helmholtz kernels. A cube is discretized into 
quadrilateral panels, with n panels along each size. The time required to perform 
a matrix-vector product, and the memory necessary for the linear system solution, 
is then tabulated for n ranging from 15 to 100. For the Helmholtz problem, we will 
require that the discretization have 15 panels per wavelength along each side of the 
cube. For a unit cell of length A, the order p of the grid representation and order M of 
the quadrature rule are chosen by the rules: kA < 1.75 corresponds to p = 3, M = 7 
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Memory use (Mb) 



number of panels 


Figure 4: CPU time and memory use for discretized cube, x: Laplace problems. *: 
Helmholtz problems, with kn = 15, n the number of panels along a side of the cube. 
Dash line: best fit line to Laplace data: assumed time, memory = Cn a , computed 
a = 1.16 for CPU time, a = 1.11 for memory use. 


(26-point rule); 1.75 < kA < 2.75 corresponds to P = 4, M = 11 (56-point rule); 
kA > 2.75 corresponds to p = 5,M = 14 (72-point rule). The results are shown in 
Fig. 4. 

The results for Laplace’s equation follow the expected 0(N 1 - 2 ) behavior very 
closely. Some degree of irregular growth is apparent in the plot as a result of changing 
grid levels. The cost of the precorrected-FFT method is generally greater when the 
Helmholtz kernel is used, in part because complex quantities must be manipulated, 
but mostly because a higher-order grid representation is necessary to accurately 
represent the charge in a cell. For the range of frequencies considered, the problems 
with a Helmholtz kernel appear to be roughly a factor of 2 — 10 slower than the 
problems with a Laplace kernel. The growth with problem size of computation time 
and memory usage seems to be fairly irregular, for the choice of grids considered 
here. The observed irregularity occurs because the order of the approximation must 
change to maintain a fixed relationship between the wavelength and the size of a 
computational cell. 

Now we demonstrate that the precorrected-FFT technique can accurately compute 
solutions of integral equations with an oscillatory kernel. Assume a sphere of radius 
a, with the boundary conditions 

u(x) = (ka) sin 2 9 cos 9 cos 2 <fi 

which has solution if)(r,9,(f>) = (kr) sin 2 9 cos 9 cos 2<p. The sphere was discretized 
along longitudes and latitudes, with 50 divisions in each variable, to generate a 
problem with 2600 panels. We take k = 47 r, corresponding to a sphere 4 wavelengths 
in diameter. Fig. 5 shows the computed results. The agreement is excellent, and 
closer inspection shows the error in the computed fields to be less than 10“ 3 , on the 
order of the GMRES tolerance. We have encountered no computational difficulties 
at much smaller or moderately larger wavelengths. 
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Figure 5: Solid line: real part of exact solution. Dashed line: imaginary part of exact 
solution, x: computed real part of solution. +: computed imaginary part of solution. 

6. CONCLUSIONS 


In this paper we described and carefully analyzed a collocation-grid-projection plus 
precorrected-FFT method for solving potential integral equations with ^ and e lkr /r 
kernels for a wide range of k. We demonstrated experimentally and analytically that 
the errors are well-controlled, and showed that the method is competitive with fast- 
multipole algorithms for £ kernels but is much more general. It should be noted that 
the collocation-grid-projection plus precorrected-FFT method can be combined with 
the multilevel methods in [3] to minimize the effects of inhomogeneity, but we have 
yet to see the need for such an approach in practical applications. 
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MULTIGRID TECHNIQUES FOR HIGHLY INDEFINITE EQUATIONS 


Yair Shapira 

Computer Science Department, Technion — 
Israel Institute of Technology, Haifa 32000, Israel. 


SUMMARY 

A multigrid method for the solution of finite difference approximations of elliptic PDEs 
is introduced. A parallelizable version of it, suitable for two and multi level analysis, is 
also defined, and serves as a theoretical tool for deriving a suitable implementation for the 
main version. For indefinite Helmholtz equations, this analysis provides a suitable mesh size 
for the coarsest grid used. Numerical experiments show that the method is applicable to 
diffusion equations with discontinuous coefficients and highly indefinite Helmholtz equations. 


1 INTRODUCTION 

The multigrid method is a powerful tool for the numerical solution of elliptic PDEs [4]. 
Its rate of convergence, however, deteriorates when non-elliptic problems are encountered; 
this phenomenon is due to error components (modes, eigenvectors) which have nearly zero 
eigenvalues with respect to the coefficient matrix. For convection problems, for example, 
error modes which are smooth in the convection direction are nearly singular and require 
a special treatment [6] [7]. For indefinite equations, we distinguish two classes of problems: 
(a) slightly indefinite problems, for which very few modes with negative eigenvalues (say two 
or three) exist, and (b) highly indefinite problems, for which many more such modes exist. 
For class (a), the method of [5], which is based on filtering nearly singular modes, achieves 
convergence rates which are close to those for the Poisson equation. The Cyclic Reduction 
Multigrid (CR-MG) of [8] is also superior to standard multigrid. For class (b), a projection 
method (suitable for finite element schemes) is presented in [3]. The AutoMUG method of 
[16] [17] [18] and a variant of Black Box Multigrid [15] also achieve satisfactory convergence 
rates especially when supplemented with an acceleration scheme. The two latter methods 
can also handle diffusion problems with discontinuous coefficients. 

The aim of this work is to supply a suitable implementation for AutoMUG for highly 
indefinite Helmholtz equations. To this end, we introduce a parallelizable version of Auto- 
MUG, called Parallelizable AutoMUG (PAMUG). This method may be considered a gener- 
alization of the Parallelizable Superconvergent Multigrid (PSMG) of [11] to nonsymmetric 
and indefinite problems. PAMUG uses the fine grid at all levels, hence is suitable for par- 
allel architectures with a large number of processors; however, we do not use it as a solver 
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but only as a theoretical tool supplying a suitable implementation for AutoMUG. Due to 
its simple algebraic formulation, PAMUG is suitable for two-level analysis in some cases. 
Furthermore, in some model cases, including indefinite Helmholtz equations, the spectrum 
of the multi level iteration matrix is computable. This enables one to choose in advance a 
suitable mesh-size for the coarsest grid and a suitable acceleration scheme (if needed). Due 
to the similarity of AutoMUG and PAMUG, this implementation applies also to AutoMUG, 
as follows from numerical experiments. 

The content of this paper is as follows. In Section 2 AutoMUG and PAMUG are defined. 
In Section 3 they are analyzed. In Section 4 numerical experiments (using AutoMUG) are 
reported. 

2 THE AutoMUG AND PAMUG METHODS 
2.1 Abstract Definition of a Multi Level Method 

We start with an abstract definition of a multi level (ML) method for the solution of the 
linear system of equations 

Ax = b. 

In the following, S : x —¥ Sx is a smoothing procedure and e, r, t and o are nonnegative 
integers denoting, respectively, the cycle index, the number of presmoothings, the number of 
postsmoothings and the minimal bandwidth of A (with some ordering of variables) for which 
ML is called recursively. The operators R (restriction), P (prolongation) and Q (coarse grid 
coefficient matrix) will be defined later. 

ML(^Cjjj, A, 6, x 0 ut) . 
if A is of bandwidth < o 
for some variable ordering 

x out ^ A b 

otherwise: 

Xi n <— Sx in (repeat r times). 

e «— 0 (1) 

ML(e,Q, 
e <— e ou t 

Xout ^ X{ n 

Xout Sxgut (repeat t times). 

An iterative application of ML is given by 

x 0 = 0, k = 0 

while \\Axk — b\\ 2 > threshold • || Ax 0 — b\\ 2 

ML(x k ,A,b,x k+1 ) (2) 

k <— k + 1 
end while. 


R(Ax in &)>Cout) 


-Pe 


repeat e times 
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Below we define the operators R, P and Q of (1) for AutoMUG, its variant AutoMUG(g) 
and the parallelizable versions PAMUG and PAMUG (g). 


2.2 Some Matrix Functions 


Let K be a positive integer and / the identity matrix of order K. For any matrix M, 
M = define the matrix functions 


rowsum(M) 

D(M) 

R(M) 

Q(M) 

P(M) 

S(M) 


K 

= diagiY.rrii^iKiKK 

3 - 1 

= diag(M) 

= 2 1-MD(M)- 1 
= R(M)M 
= 21 — D(M)~ 1 M 
= rowsum(P(M)). 


These definitions apply to AutoMUG and PAMUG. For AutoMUG(g) and PAMUG(g), 
replace the above definition of S(M) by S(M ) = (2 + q)I (the role of the parameter q 
will be explained later). Let Vk be the space of the K x /f-grid functions (it is assumed 
hereafter that the first point in a grid is numbered (1, 1)). Define the orthogonal projection 
O :Vk —» V[K/ 2 i by (Ov)ij = v 2 i, 2 j and the permutation U by 


(Uv)i,j = v jti , v £ V K . 


For any matrix B, we say that B is a A'-block matrix if B is block diagonal with tridiagonal 
blocks of order K, that is, 

B = blockdiag(B^)i<j<K, 


with 

B b) = tridiag(#\<£\ 4 J) )i<i</r, 1 < j < K. 


By the notation Hridiag’ we mean a periodically extended tridiagonal matrix, that is, = 
B^ k and d$ = B ( £ v We assume that either 


= 0, 1 < j < K 

or K — 2 k for some positive integer k. This guarantees that A and the coarse grid coefficient 
matrices defined bellow are of property-A. Actually, the block submatrices B^ need not be 
of the same size; for simplicity, however, we assume that they are. Non-rectangular grids 
can be embedded into rectangular ones (see [9] [18]). 


2.3 Transfer and Coarse Grid Operators 

Here we define the operators R, P and Q used in (1) for linear systems which arise, for 
example, from finite difference approximations of elliptic PDEs. 
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Let N and n be positive integers, where n < |_log 2 N\ denotes the number of levels minus 
1. Assume that A is of the form 

A = X + F, (3) 

where X and UYU are A"- block matrices. For example, if 

Bh 2 

X = UYU = blockdiag[tridiag(— 1, 2 — , —1)], (4) 

(where (3 is a parameter and h is the cell size) then A represents a five-point second order 
discretization of the Helmholtz equation 


'U'xx 'U'yy f (^) 

in a square (the unit square is used here) . 

Define X 0 = X and Fo = F. For i = 1, . . . , n, define the matrices Ri, Pi and A*, in this 
order, by 


Xi 

= S(Yi_i)Q(Xi-i) 

Yi 

= S(Xi-i)Q(Yi-i) 

Ri 

= OR(Yi-i)R(Xi-i) 

Pi 

= P(Xi-i)P(Yi-i)0 

Ai 

= 0 (Xi + Yi) O t . 


These definitions apply to AutoMUG and AutoMUG(g). For the parallelizable versions PA- 
MUG and PAMUG (g), they are modified as follows: omit the operators O and O t in the 
above definitions and replace the definition of Pi by Pi = /. The parameter q in AutoMUG (q) 
and PAMUG (g) is chosen by the user such that S(X^i) and S(Yi-i) are optimally approxi- 
mated, in some sense, by (2 + q)P, for example, if (3 in (5) varies with the spatial coordinates, 
then a reasonable choice for q is an average value of -/3h 2 /4. PAMUG(g) and AutoMUG (g) 
are suitable for two-level analysis. For simplicity, q = 0 is used in most of this analysis. 

The ML procedure, namely ML(z in , A, b, x ou t) defined in (1), is called n + 1 times per 
iteration. In the (n + l)st time, it is a direct solver. In order to implement AutoMUG, 
AutoMUG(g), PAMUG or PAMUG(g), the ith call to the ML procedure, 1 < i < n, uses 
the operators 

Q 4 — Aj, R 4 — Ri and P 4 — Pi. 

Note that, for PAMUG and PAMUG (g), Ai includes four independent subsystems, each of 
which corresponds to odd (even) numbered variables in the x and y spatial directions (see 
[18]). Furthermore, the coarse grid equations in PAMUG and PAMUG (q) corresponding 
to even numbered variables in both spatial directions are identical to those of AutoMUG 
and AutoMUG(g), respectively. Roughly speaking, these methods have a similar effect on 
low frequency error components, hence it is likely that convergence rate estimates for the 
parallelizable versions are fair approximations to those for the sequential ones. This is veri- 
fied in Corollary 1 and the numerical experiments in Section 4. For certain examples, e.g., 
convection-diffusion equations with periodic boundary conditions, AutoMUG and PAMUG 
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are equivalent to AutoMUG(O) and PAMUG(O), respectively, because all the row-sums used 
in AutoMUG and PAMUG are equal to the constant number 2 (as a matter of fact, Auto- 
MUG is equivalent to AutoMUG (0) also for other types of boundary conditions, provided 
that N is odd). This is also the case for either definite or indefinite Helmholtz equations, 
provided that an appropriate 0 is used. Hence, one can learn about the features of Auto- 
MUG (which is actually used in our applications) from the analysis of PAMUG, PAMUG(g) 
and AutoMUG (q). 


3 ANALYSIS OF PAMUG AND AUTOMUG 


3.1 Two-Level Analysis 

Here we derive upper bounds for convergence rates for PAMUG(O) and AutoMUG(O) applied 
to a class of equations, including Symmetric Positive Definite (SPD) Helmholtz equations 
(e.g., (3h 2 / 2 < 4sin 2 (7rh/2) in (4)). These bounds are independent of the size of the problem 
and the clustering of the eigenvalues near zero. This implies that AutoMUG is capable of 
handling nearly singular eigenvalues; hence, it may solve highly indefinite problems, provided 
that the negative eigenvalues are handled by a suitable acceleration scheme (see also Section 
3.3). 

Since PAMUG is designed for parallel implementations, it may be assumed that the 
damped Jacobi iteration, which is perfectly parallelizable, is used as a smoothing procedure 
(for some architectures, two damped Jacobi relaxations are less expensive than one red-black 
Gauss-Seidel sweep). This simplifies the analysis considerably. 

The order in which smoothing and coarse grid correcting are performed is immaterial, due 
to the commutativity of the smoothing and coarse-grid correcting operators. For consistency, 
however, we consider damped Jacobi iterations for presmoothing and other methods (e.g., 
Jacobi) for postsmoothing. 

Theorem 1 Assume that 


• X and Y commute with each other. 

• D(X) = D(Y) = I (isotropy assumption). 

• the spectra of X and Y lie in the interval (0,2) (e.g., X and Y are symmetric M- 
matrices or symmetric irreducibly diagonally dominant matrices, see [20]). 


Then the convergence factor for a two-level implementation of PAMUG (0) with r damped 
Jacobi presmoothings (with damping factor 2/3J and no postsmoothings is bounded from 
above by 


max 


3 r r 

4 (r + l) r+1 


3_ 1 
4e r 


( 6 ) 
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For the proof, see Appendix A. 

Corollary 1 Assume that A is normal. Then Theorem 1, with the bound in (6) multiplied 
by 2, applies also to AutoMUG(O), provided that one additional postsmoothing of the form 
x 4- POx is performed. 

For the proof, see Appendix B. 


3.2 Multi-Level Analysis for PAMUG 


Theorem 1 yields convergence rates for the two-level implementation of PAMUG (0) to es- 
sentially semi positive definite problems. This implies that indefinite problems may also 
be solved, provided that the negative eigenvalues are handled efficiently by an acceleration 
scheme. In this section, we give quantitative support for this heuristic. 

Theorem 2 Assume that the blocks in X and UYU are circulant Toeplitz matrices , that is, 

X = blockdiag[tridiag(bo,c 0 ,d 0 )] 

UYU = blockdiag[tridiag((3o, 70 , 5o)] 


for some constants bo, Co, do, fio, 7 o and 5 0 - Let 

Po 


bo + Co + do Po + To + 5o 

Qo = - 


co 


To 


For 0 < i < n — 1, define 


b i+ 1 = -(2 - qf)b\/ci c i+ i = (2 - &)(q - 2 Mi/c*) 
d i+ 1 = -(2 - qpdp/ci p i+ i = ( b i+ i + c i+1 + d i+ i)/c i+ i 
Pi+i = -(2 - Pi)Pf/ji Ti+i = (2 - Pi) (t< ~ Wihi) 
<^i+i = — (2 — Pp^i/li qi+i = (Pi+i + Tt+i + ^»+i)/t*+i- 


Define 


g(c, t;p, q]x,y ) 


(2-x/c){2-y/j)(x + y) 

(2 - q)x{ 2 - x/c) + (2 - p)y( 2 - y/ 7 ) 

( x + V V 

f r (c, t; p, 9 ; x, y ) = g{c, 7 ; p, <t, x, y) (1 - ^ c+ J 

/ r (n_ 1 ) (x,y) = fr(c n -u'rn-T,Pn-l,qn-T,X,y). 

For i = n — 2, n — 3, . . . , 0, define 

fr l) (x,y) = f r (ci, 'yf, Pi, qf, x, y) 

+ / r {l+1)£ (( 2 - qi)x{ 2 - x/ci ), (2 - pf)y { 2 - y/^)) 

x + y V 


(1 ~ g{ci,1u Pi, qi,x,y)) 1 - 


a(ci + J i ) / 
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Then there exists an orthogonal matrix T such that the iteration matrix of PAMUG (im- 
plemented with cycle index e and r damped Jacobi smoothings with damping factor or 1 ) is 
given by 

T diag{fl ^{x, y)}(x,y)£spect(X)xspect(Y)T. 

For the proof, see Appendix C. Theorem 2 yields an efficient way to compute in advance the 
spectrum of the iteration matrix of PAMUG. This method is employed below for our model 
problem. 


3.3 The Indefinite Helmholtz Equation 

As discussed in [5], the most problematic eigenvalues of indefinite equations are those which 
are close to zero. Theorem 1 and Corollary 1 show that PAMUG (0) and AutoMUG(O) 
handle positive eigenvalues arbitrarily close to zero, giving convergence factors which are 
independent of the size of the problem and the clustering of the eigenvalues. Although this 
applies to the two-level method and definite problems, it indicates that the algorithm may 
also be efficient for the multi level method and indefinite problems. In this case, however, 
the cell-size of the coarsest grid cannot be arbitrarily large, as is shown below. 

When the coarsest grid is not too coarse, numerical computations using Theorem 2 show 
that the PAMUG iteration matrix has only a few eigenvalues of magnitude larger than one. 
These eigenvalues may be annihilated (their corresponding error components are significantly 
reduced) by an appropriate Krylov space acceleration method applied to the basic multi level 
iteration (2). The remaining eigenvalues are considerably smaller (in magnitude) than one; 
good convergence rates are thus achievable, provided that the dimension of the Krylov space 
is large enough, say twice as large as the number of eigenvalues of magnitude greater than 
one. When the number of levels is large, so that very coarse grids are used, the spectrum 
of the iteration matrix significantly deteriorates; the magnitude of many eigenvalues then 
approaches one and exceeds it. 

Thus, Theorem 2 may help in choosing in advance an appropriate dimension for the 
Krylov space in the acceleration method. For highly indefinite problems, however, this di- 
mension must be rather large; in this case, a conventional acceleration method, such as 
GMRES of [14], will not do, since the required amount of storage (respectively, arithmetical 
operations) increases linearly (respectively, quadratically) with the dimension of the Krylov 
space used. The Transpose Free Quasi Minimal Residual method of [12] and the Conju- 
gate Gradient Squared method of [19], which use arbitrarily large Krylov spaces with fixed 
requirements of work and storage, are thus preferable. 

Consider the indefinite Helmholtz equation (5) in the unit square with periodic boundary 
conditions, discretized as in (3), (4). Our aim is to compute the spectrum of the PAMUG 
iteration matrix for this problem. In this case, 

spect(X) = spect(Y) = {4sin 2 {irj/N) - f3/{2N 2 )} l < j<N . 

Modes which are constant in either one of the spatial directions are excluded; this is equiv- 
alent to assuming that the right hand side includes no Fourier modes which are constant in 
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one of the spatial directions, and the equation is projected onto the linear subspace orthog- 
onal to the set of these modes. This situation simulates problems with Dirichlet boundary- 
conditions, since the spectrum of X and Y is not enlarged by the transformation 

periodic boundary conditions -» Dirichlet boundary conditions 
N N/ 2-1 

0 -> 0/4. 

One damped Jacobi smoothing (with damping factor 1/2) and two Jacobi smoothings are 
used in each level of a V-cycle. This implementation is chosen in order to cancel possible 
poles of the function g of Theorem 2 (and the proof of Theorem 1) and guarantee that the 
functions /W there are bounded. Indeed, it is verified that no pole of the functions /W is 
encountered during the computation. This choice was the most efficient one; using, e.g., 
damping factor 1/2 for all the three relaxations yields worse results. This is another place 
where the theory helps in choosing a suitable implementation; however, it is suitable only 
for ideal parallel machines, whereas in practice (Section 4) we use AutoMUG with the more 
efficient red-black Gauss-Seidel relaxation. 

The results are displayed in Figures 1 and 2. The last rows of these figures show how the 
spectrum deteriorates when the coarsest grid is too coarse. Here 0 = 3200 and we find that 
for N = 256 and 512, respectively, using 3 and 4 levels yields only a few large eigenvalues. 
The remaining eigenvalues are contained in [—0.25, 0.25], which implies that the effective rate 
of convergence should be around 0.25, provided that the large eigenvalues can be handled by 
the acceleration. Consequently, a 64 x 64 coarsest grid is suitable for achieving this rate of 
convergence. In light of the above discussion, it is expected that for Dirichlet problems and 
0 = 800 the choices N = 127 and N = 255 yield pictures which are much the same as those 
of Figures 1 and 2, respectively; hence a 31 x 31 coarsest grid is suitable in this case. When 
a further coarser grid, namely, a 15 x 15 grid, is used, the eigenvalues of the iteration matrix 
are clustered around ±0.7; thus, a convergence factor of at least 0.7 is expected in this case 
(see Table 2 below). It can also be inferred from the figures that the number of levels is 
immaterial; what matters is the cell-size of the coarsest grid alone. This is in agreement 
with a result of [3] (see also Table 1 below). 

There is also a physical explanation for the above lower bound on the resolution of the 
coarsest grid. For Equation (5), consider waves of wave number ( k , l) satisfying 7r 2 (h 2 + l 2 ) « 
0. Evidently, these waves appear in the solution, since they are amplified by the inverse of 
the operator. Hence, an appropriate coarse grid must be capable of approximating these 
modes. In particular, it should be sufficiently fine to approximate the above modes with 
k = 0 (resp., k = 1) and l = 0 (resp., I = 1) for periodic (resp., Dirichlet) boundary 
conditions. In light of the Nyquist rate, a proper approximation requires 2 points per wave 
length; this yields roughly [iV/2 n J > 2y/0/ir. 

Another explanation for the above restriction arises from matrix theory. It was observed 
that for sufficiently fine grids, the coefficient matrix is an L-matrix, that is, has positive main 
diagonal elements and nonpositive off-diagonal elements. For too coarse grids the amount of 
indefiniteness is so large that the main diagonal elements become negative, which leads to 
an inappropriate PDE approximation. 
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Figure 1: Eigenvalues (of magnitude > 0.25) of the PAMUG iteration matrix for the indefi- 
nite Helmholtz equation with (3 = 3200, N = 256 and periodic boundary conditions. 
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Figure 2: Eigenvalues (of magnitude > 0.25) of the PAMUG iteration matrix for the indefi- 
nite Helmholtz equation with /3 = 3200, N = 512 and periodic boundary conditions. 
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4 


NUMERICAL EXPERIMENTS 


4.1 A Comparison of Various Multigrid Methods 


We apply AutoMUG and several other multigrid algorithms to the problem 
-u xx - u yy - 800« = /, ( X , y) e Q, = (0, 1) x (0, 1), 

with complex boundary conditions of the third kind 


+ 10 iu = g (£, y) e T C <90 

(where n is the outer normal vector) and Dirichlet boundary conditions on <90 \ T. We 
consider the following cases: 

(a) r = 0 

(b) r = {0} x [o,i]. 

The equation is discretized via a second-order five-point difference scheme (as in (3)-(4)). 
Uniform N x N grids are used. The exact solution is u = xy. The initial guess is random in 
( 0 , 1 ). 

To the basic multi-level iteration (2), we apply the Transpose Free Quasi Minimal Resid- 
ual (TFQMR) acceleration method (Algorithm 5.2 in [12]), which avoids the computation 
of the transpose of the coefficient matrix and preconditioner (the latter is only implicitly 
given in (1), so its transpose is not available). TFQMR may be considered a modification 
of the Conjugate Gradient Squared (CGS) method of [19]. The costs of these acceleration 
techniques are comparable to that of the Conjugate Gradient method, that is, about 1-1.5 
work units per iteration. We found that the performance of CGS and TFQMR is similar; 
we preferred the latter, though, because of its smooth convergence curve. 

The multi level methods are implemented with the red-black Gauss-Seidel (RB) smoother 
in a V( 1,1) -cycle. The coarsest level equation is solved with six orders of magnitude accuracy. 

We define the following measures of efficiency: the convergence factor 

cf _ \\ Ax i as t - b \\ 2 

^Axi as t-i ^ j 1 2 


and the averaged convergence factor 


avcf = 


/ II Axi ast - 6|[ 2 \ 1/l<lSt 

V II4*o-&I| 2 ) 


where last is the smallest positive integer for which 

"J- 2 < threshold 
IIAxo - 6|| 2 

and threshold is about 10 -6 . When acceleration is used, the convergence factor often oscil- 
lates; hence, for the highly indefinite examples, only avcf is reported. 

AutoMUG is compared to 3 other multigrid methods which share the same complexity 
(that is, use 5-coefficient stencils at all levels): 
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1. Standard Multigrid (MG): coarse grid operators are derived from rediscretizations of 
the differential equation; full-weighting and bilinear interpolation are used for restric- 
tion and prolongation, respectively. 

2. Cyclic Reduction Multigrid (CR-MG) [8]: coarse grid operators, restriction and pro- 
longation are defined as in [8]. 

3. Full CR-MG (F-CR-MG): coarse grid operators are generated from [8]; full- weighting 
and bilinear interpolation are used for restriction and prolongation, respectively. 

The results are displayed in Tables 1 and 2. 


Table 1: Averaged convergence factors (avcf) for various multigrid methods (with TFQMR 
acceleration). The results show that once the resolution of the coarsest grid is fixed, the rate 
of convergence is independent of the number of levels. 


N 

levels 

■a 

MG 

F-CR-MG 

CR-MG 

AutoMUG 

gga 

4 

(a) 

.540 

.267 

.614 

.277 


3 

(a) 

.549 

.272 

.506 

.280 

63 

2 

(a) 

.561 

.273 

.404 

.312 

63 

2 

(b) 

.651 

.694 

.748 

.396 


Table 2: Averaged convergence factors (with TFQMR acceleration) showing the deterioration 
of convergence rates when the resolution of the coarsest grid is too coarse. 


N 

levels 

■a 

MG 

F-CR-MG 

CR-MG 

AutoMUG 

63 

2 

(a) 

.561 

.273 

.404 

.312 

63 

3 

(a) 

> .9 

.771 

> .95 

.737 


Remark: it was also found that for diffusion problems with discontinuous coefficients (e.g., 
Examples 7 and 9 in [18]) MG and both variants of MG-CR stagnate. 


4.2 Problems with Discontinuous Coefficients 


AutoMUG and two variants of Black Box Multigrid are applied to problems of the form 


with 


— V(DVu) — au = f in Q. = (0, w 2 ) x (0, U 2 ), 


j(t) 

D(x,y) 


I 0 0 < t < 

[ 1 UJi < t < U>2 

{ d r ( x , y) €: fl, 
d b (x, y) e Q,, 
d 0 


j(x) +j{y ) mod 2 = 0 

j(x)+j(y) mod 2 = 1 

{x,y) n 


? 
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a(x,y) 


f(x,y) = 


( 7 f 

{x,y) e n, 

j(x) +j(y ) mod 2 = 1 

Ob 

{x,y) 6 ft, 

j(x) +j(y) mod 2 = : 

o 0 


(• x , y) 0 ft 

0 

{x,y) e ft, 

j{x) +j(y) mod 2 = 0 

1 

(x,y) e ft, 

j(x) +j{y) mod 2 = 1 

0 


(x,y) 0 ft 


and mixed boundary conditions of the form 


Du n + 70 u = 0 x = 0 ov y = 0 
Du n + 7 \u = 0 x = u )2 or y = ^ 


(where uq, u> 2 , 70, 7i, d r , d 0 , a r , a b and a 0 are parameters). The finite volume discretiza- 
tion of [2] is used. However, since it results in a strong coupling between domains which 
are only weakly coupled in the PDE and, hence, in an inadequate scheme (see [2]), it is not 
applied to the original but to the modified problem —V(DVu) — au = /, where 


£ 

8 

D(x,y) 

a(x,y) 


df -I- dfo 
2 

<J r + 


min (dr/dt, db/d r ) 
min {d r /db, db/d r ) 


2 

£ 

D(x, y) 

8 m.&x.{\x — uj\\,\y — u\\) < h/2 

cr(x,y ) otherwise. 


\x — u)i \ + \y — u>\\ < h 
otherwise 


A uniform 63 x 63 fine grid is used (the only exception to this are Examples (12)-(13) in 
Table 3 representing the ‘staircase’ problem of [2], where a uniform 17 x 17 fine grid is used). 
When Dirichlet boundary conditions are imposed, it is denoted by y 0 = 71 = 00. In this 
case, no grid point lies on dft; all equations are non-trivial. The initial guess is zero. 

The results in Table 3 correspond to the following methods: (A) AutoMUG; (B) Black 
Box Multigrid [9]; and (C) the second method in [10]. For Examples (l)-(ll), these methods 
were implemented with coarse grids consisting of even numbered variables of the next finer 
grid (similar results, however, were obtained when odd numbered variables were used for 
this purpose). The off diagonal row-sum modification introduced in [10] is not used, since 
(apart from Examples (12)-(13)) coarse grids do not include boundary points of the next 
finer grid (see [15]). Also, prolongation is done without using the right hand side, since it 
was found in [15] that this does not improve the convergence for indefinite problems. 

The multigrid cycle is implemented as in the previous subsection. For methods (B) and 
(C), however, since 9-coefficient stencils are used, RB is replaced by the four-color ordering of 
[1]. Acceleration is used only for highly indefinite problems, namely, when max(cr r , cr&) > 100. 

A comparison of Examples (1) and (2) of Table 3 shows that, as implied by Corollary 1, 
AutoMUG (with no acceleration) performs for nearly singular Helmholtz equations almost as 
well as for the Poisson equation. For more highly indefinite problems, however, acceleration 
must be used. 
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Table 3: Three multigrid methods, (A) AutoMUG, (B) Black Box Multigrid and (C) the 
second method of Dendy (87), applied to definite and indefinite problems with discontinuous 
coefficients. Uniform 63 x 63 (resp., 17 x 17) fine grids are used for Examples (l)-(ll) (resp., 
(12)-(13), the ‘staircase’ problem). 

Description of examples 


example 

OJl 

i02 

7o 

7i 

df 

db 

d Q 

(X 'P 

C7 

O 0 

acceleration 

(1) 


1 

oo 

oo 

i 

i 

i 

0 

0 

0 

no 

(2) 


1 

CO 

00 

i 

i 

i 

20 

20 

20 

no 

(3) 


1 

oo 

00 

i 

i 

i 

400 

400 

400 

yes 

(4) 


D 

10* 

10* 

i 

i 

0 

400 

400 

0 

yes 

(5) 

30/62 

n 

10* 

10* 

i 

i 

0 

0 

400 

0 

yes 

(6) 

31/62 


10* 

10* 

i 

i 

0 

0 

400 

0 

yes 

(7) 

30/62 

B 

10* 

10* 

1000 

i 

0 

0 

400 

0 

yes 

(8) 

31/62 

H 

10* 

10* 

1000 

i 

0 

0 

400 

0 

yes 

w®m 

■ | 

m 

0 

19 

1 

a 

0 

0 


0 

no 

I 



0 

1 


B 

0 

0 


0 

no 

KH 

mm 

62 

0 

0.5 

IB8M 

B 

0 

0 


0 

no 


Numerical results 





cf 



avcf 


example 

levels 

A 

B 

C 

A 

B 

C 

(1) 

4 



.159 


.072 

.184 

(2) 

4 

1 

.431 

> 1 

1 

.507 

> 1 

(3) 

3 




.336 

.702 

.835 

(4) 

3 




.329 

.335 

.567 

(5) 

3 




.369 

.315 

.516 

(6) 

3 




.295 

.285 

.464 

(7) 

3 




.298 

.283 

> .8 

(8) 

3 




.291 

.341 

.530 

(9) 

4 


.118 

.238 

.151 

.114 

.267 

(10) 

4 

.381 


.211 

.429 

.142 

.232 

(11) 

4 

.148 

.987 

.988 

.192 



(12) 

2 

.153 

.121 

.133 

.196 

.141 

.151 

(13) 

3 

> 1 

.220 

.240 

> 1 

.237 

.269 
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Examples (9)-(13) deal with diffusion problems with discontinuous coefficients. In par- 
ticular, Examples (12)-(13) are the ‘staircase’ problem (Example IV in [2], where D = 1000 
inside the staircase and D = 1 outside). 

It is evident from Example (11) that Black Box Multigrid stagnates when the break point 
u)\ lies on the coarse grids. The reason for this is that the 9-coefficient stencils of its coarse 
grid operators involve strong- coupling between domains which are only weakly coupled in 
the PDE. Hence, in this case, the 5-coefficient stencils of AutoMUG are preferable (see [15] 
for a variant of Black Box Multigrid which overcomes this problem). 

It is interesting to mention that when D, rather than D, is used for the finite volume 
discretization in Example (10), Black Box Multigrid converges rapidly while AutoMUG 
diverges. However, in light of the remarks made in [2], it is not clear whether the resulting 
scheme is meaningful. 

Acknowledgment. The author wishes to thank Moshe Israeli for suggesting the physi- 
cal motivation for the restriction on the grid resolution and Irad Yavneh for his valuable 
comments. 


APPENDIX A 


Proof of Theorem 1: Let v be a common eigenvector of X,Y and A with the eigenvalues x, 
y and x + y, respectively. Then v is also an eigenvector of the iteration matrix of PAMUG(O) 
with the corresponding eigenvalue f r (x,y), where 


g{x,y ) 


_ (2-aQ(2 -y)(x + y) _ 
2(x(2-x) +y(2-y)) 

and f r (x,y) = (l - 


xy( 4 - x-y ) 
2(x{2 - x) + y(2 - y)) 

) 9(x,y). 


To prove the theorem, it is sufficient to bound |/ r | in the region 0 < x, y < 2. In this region, 
0 < \fr \ < 9 < 1- Since g is symmetric, it is natural to write it as a function of the symmetric 
variables c — x + y and d = xy. Clearly, (c, d) G (0, 4) x (0, 4), 


d) = 2(2c^— c 2 + 2d) and /rM = (l-5) f sM- 
The partial derivative of g with respect to d is 

dg . . _ (4 — c)(2c — c 2 + 2d) — 2d(4 — c) 

dd [C,d) = : 2(2c - c 2 + 2d) 2 

(4 — c)c(2 — c) 

2(2c — c 2 + 2d) 2 

Hence dg/dd > 0 if 0 < c < 2, dg/dd = 0 if c = 2 and dg/dd < 0 if 2 < c < 4. Assume that 
0 < c < 2. Then g achieves its maximum on the hyperbola xy = d for which d is maximal. 
This happens at the point x = y = c/2. But at this point we have g = c/ 4 and 

3 — c\ r c 



We find the maxima of h: 


'‘ ,(C) = G-^> C) = 0 

or 3 — c — cr — 0 or c = 3/(r + 1). The maximum of h in (0, 2) is thus 

h ( 3 ^ = 3 rT 
• \r + l) 4 (r + l) r+1 ’ 

The theorem follows from |/ r | < (^) = 3~ r in the region 2 < c < 4. □ 


APPENDIX B 


Proof of Corollary 1: For i € {0, 1}, define the injections O x< i and O y j by 


( QxjV)l,m — 


vi tJn l = i mod 2 
0 l ^ i mod 2 


and ( — 


vi, m rn — i mod 2 
0 m^i mod 2 


v e Vn 


(0 Xt i injects onto every other y - line and O y ^ injects onto every other x-line). Let v be a 
common eigenvector of X and Y with the corresponding eigenvalues x v and y v , respectively. 
Since X and Y are of property-A, it follows from [21], Sec. 7.1 that the following is a set of 
common eigenvectors of X and Y : 


W = \ Y. (-i)“ + ®o*,iOw4 

Uj€{ 0 ,l} J Q)| g e { 0 ,l} 

The elements of W are orthogonal to each other and have the same Z 2 norm. Denote by x w 
(resp., y w ) the eigenvalue of an element w £ W with respect to X (resp., Y). Define the set 
of vectors 

V = 0,1} • 

Define the symmetric orthogonal discrete Haar transform 

H = (lh, s ) lM{ h 7 , , = 

Hence W = HV and V = HW. Let Ma and Mp denote the iteration matrices of Auto- 
MUG(O) and PAMUG(O), respectively. Note that 0 T 0 = O x <oO y fl and that OMa = OM P . 
The assumption that a postsmoothing of the form x <— POx is performed is equivalent 
to replacing the substitution x out <— x in — Pe in (1) by x out <— P(Oxi n — e). From these 
observations and the proof of Theorem 1, it follows that, for any w £ W, 

MaW = fri^WiVw) (1 ^v) (1 — yv) j O x ,iO yJ V. 

i,je{ 0 , 1 } 

Consequently, span(W) is an invariant subspace of M A ■ Let Ma denote the restriction of 
Ma to span(W). The representation of Ma in the basis W is of the form Ma = 2~ 1 Hpu t , 
where p and u are the following four-dimensional vectors: 

p = (1) 1 x v , 1 y v , (1 x t) )(l y v )) and u — yw)')w&W' 
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Let p denote the spectral radius of a matrix. Then 

\\M a \\ < |p(pu*up*) 1/2 = i||p|| ||u|| < ||u|| < 2max\f r (x w ,y w )\. □ 


APPENDIX C 

Proof of Theorem 2: Let 

A 0 = A and Di = diag(Ai), 0 < i < n — 1. 

Consider the ?th call to the PAMUG procedure in the PAMUG method (1), 1 < i < n. T his 
call is designated to solve the equation Aj-je = f. For this equation, denote the two-level 
PAMUG iteration matrix by jVj_i and the multi-level PAMUG iteration matrix by Mj_i. 
For a PAMUG cycle with index e, we have (see [13]) M n -i = N n -i, and, for 0 < i < n — 1, 

Mi = (l-(I-Ml H )A£ 1 R l+l A i )(l-a- 1 DpA i y 

= A'. + (/ - -.-‘B, '.4,) r . 

It is easily seen by induction that all the operators AT R(Xi), UYiU and UR(Yi)U, for every 
i, are block diagonal with circulant Toeplitz blocks. Hence, all the operators A;, Di and Ri, 
for every i, are diagonalizable by the 2-dimensional discrete Fourier transform; hence, so are 
also the operators Ni and, by induction, also the operators M*. The theorem follows from 
spectral analysis. □ 
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SUMMARY 


We present a new genuinely multidimensional discretization for the compressible 
Euler equations. It is the only high-resolution scheme known to us where Gauss- 
Seidel relaxation is stable when applied as a smoother directly to the resulting high- 
resolution scheme. This allows us to construct a very simple and highly efficient 
multigrid steady-state solver. The scheme is formulated on triangular (possibly un- 
structured) meshes. 


INTRODUCTION 


One of the most challenging problems in numerical analysis was the construction of a 
numerical scheme for gas dynamics in one dimension. Such a scheme had to combine 
high-order accuracy in the regions of the smooth flow with the ability to represent 
discontinuities by thin oscillation-free layers. These two properties are not both at- 
tainable within the class of linear schemes (Godunov’s theorem). Therefore, the suc- 
cessful scheme should be non-linear. Schemes of this type were named high-resolution 
schemes. The discrete schemes for the equations of gas dynamics in multidimensions 
are usually obtained using the dimensional-splitting approach, i.e. applying a one- 
dimensional scheme in each coordinate direction. The main problem, however, is that 
the steady-state solvers based on such schemes suffer from poor computational effi- 
ciency. It was observed by Spekreijse [1] that such a simple and efficient smoother as 
pointwise Gauss Seidel relaxation is unstable in conjunction with such schemes even 
in the simple case of linear advection equation. The multigrid solvers, therefore, have 
to resort to multi-stage Runge-Kutta relaxation or to defect-correction techniques, 
which are not the really efficient ways to utilize the multigrid approach. 

‘This research was supported by the National Aeronautics and Space Administration under 
NASA Contract No. NAS 1-19480. 
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The reason for the fact that the Gauss-Seidel relaxation is unstable when applied 
in conjunction with the dimensionally-split high resolution schemes can be traced 
down to the particular way the nonlinearity is incorporated within these schemes. 
This motivated the search for a high resolution (at least at the steady-state) scheme, 
with the nonlinear high-resolution correction introduced in such a way that it does 
not lead to the instability of the Gauss-Seidel relaxation. This search resulted in 
the genuinely multidimensional advection scheme of the control volume type (see 
[2], [3]). The so-called fluctuation-splitting type schemes (for unstructured triangular 
meshes) were also introduced (see [4], [5]). A strong relationship between the two 
types was established in [6]. However, it was not clear for a long time how to extend 
these ideas to the systems of equations. One of the major directions was the so- 
called wave modeling (see [7], [8]). This approach concentrated on finding a way to 
represent (locally) the physics of two-dimensional flow of a compressible fluid by a 
finite number of simple waves, each one having an associated advection equation. 
However, numerical schemes created this way suffered from a lack of robustness. The 
approach introduced in [9] is concerned not with applying an advection scheme to 
discretize a system of equations in two dimensions, but rather with applying to the 
systems of equations the same strategy that was used when constructing a scalar 
advection scheme. The resulting genuinely two-dimensional scheme is formulated 
on triangular (possibly unstructured) meshes. The unique advantage of this high- 
resolution discretization is that the Collective Gauss-Seidel relaxation can be applied 
directly to the high resolution discrete equations. This results in a very simple and 
efficient multigrid steady-state solver. 

In this paper first we introduce some further enhancements to the scheme pre- 
sented in [9]. Numerical experiments will be presented. Some possible extensions of 
the truly multidimensional approach will be discussed. 

GENUINELY TWO-DIMENSIONAL ADVECTION SCHEME 

Consider a linear two-dimensional advection equation 

u t + au x + bu y = 0. (1) 

Consider the triangulation of the domain as illustrated on Fig.l. Denote by R the 
fluctuation (i.e., the residual of equation (1) on triangle T multiplied by the area of 
this triangle): 

R = R x + R y , (2) 

where 

R x — -|[a(w 0 - u 3 )] 

Rv = -l[b(u 3 -u 4 )]. 

The following fluctuation distribution formulae 

fc 2 < +1 = h - 2 < + §■ R x 

h 2 v% +1 = h?u% + \[R X + R y ] (3) 

h 2 v^ +1 = h 2 %c { +iRy 
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reproduce the central difference scheme, which is second-order accurate (in space) but 
is known to be unstable. 

We shall introduce here the positivity property. 

Definition 1. A scheme is said to be of the positive type if any solution value on 
the new time level obtained by this scheme can be written as a positive combination 
of the values from the previous time level. 

Solutions obtained by using positive schemes satisfy a certain maximum principle 
and, therefore, do not exhibit oscillatory behavior in the presence of discontinuities. 
It is obvious that the central scheme (3) is not of the positive type. 

Modifying (3) by adding the appropriate artificial viscosity terms 
1i 2 Uq + 1 = h 2 u'o + f [/^(l + sign(a))] 

h 2 u^ = h 2 u” + l[R r ( 1 - sign(a)) + ^(1 + sign(6))] (4) 

h 2 u” +1 = h 2 u 4 + y[-/? y (l - sign(6))] 

we recover the dimensional upwind scheme which is positive, but only first order 
accurate. 

Definition 2. The fluctuation-splitting scheme is called linearity preserving if 
whenever the fluctuation on the triangle T vanishes then the scheme leads to a zero 
update in. each of the three vertices of the triangle. 

The upwind scheme (4) does not satisfy this property since the fact that R = 0 
does not necessarily imply that R x = R !l = 0. Therefore, a non-zero update of the 
nodal values may be introduced. 

Introduce the following quantities 


R x * =R X + R y $(Q) 
R y * =R y + R x ^p- 

(5) 

where 

R x 

Q = -TT 

R y 

(6) 

and is a. Lipschitz continuous limiter function such that 


0 < V(Q) <1, 0 < < 1 

(7) 

and 


*(i) = i- 

(8) 


Substituting R x * ,R y * for R x ,R y into (4) satisfies the linearity preserving property. 
This can be demonstrated in the following way: assume that R. = 0. This means that 
R x = — R v or Q = —R x /R y = 1. It can be seen that no update will be introduced to 
any of the unknowns at the nodes of triangle T, provided the limiter-function satisfies 
the equality (8). This scheme is also second order accurate at the steady-state, since 
the grid considered here is structured (see [6]). 
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Using the following identity 


R y m(Q) = 

we can rewrite (5) in the following form 

R x * = /2*(1 - 
R y * = R y(i _ $(£)). 


( 9 ) 


(10) 


It is easy to see that the scheme defined by (4) and (5) (or 10) is of positive type, 
provided the inequality (7) holds. 

It is also obvious from (9) that such scheme is conservative because 

R x ' + R y * = R x + R y = R 


(for more details see [6]). 

MULTIDIMENSIONAL EULER SCHEME 


The Euler equations of gas dynamics in two dimensions can be written 

u t + F(u) x + G(u) y = o, 

where 



( P ) 


( pu \ 


( pv \ 

u = 

pu 

pv 

\ e j 

; F(u) = 

pu 2 + p 
puv 

V puH ) 

i G(u) = 

puv 

pv 2 + p 
V pvH / 

where the enthalpy H 

is defined by 





_ e + p _ c 2 u 2 + v 2 
p 7 - 1 + 2 


the speed of sound 


and the pressure 



u 2 + v 2 


P = (7~ l)(e~/> — ^ — ) 


( 11 ) 


(12) 


(13) 


(14) 


(15) 


The quasilinear non-conservative formulation of the Euler system in auxiliary vari- 
ables ( s,u,v,p ) can be introduced in two dimensions as well 


s t + us x + vs y = 0 

pu t + puu x + pvu y + p x = 0 

pV t + pUV X + pVVy + Py = 0 

Pt + up x + VPy + pc 2 (u x + Vy) = 0 


(16) 
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where ds = dp — . 

Remark 3. Note that the entropy (s) evolution is subject to the two-dimensional 
advection equation, which is locally decoupled from the rest of the system. 

The fluctuation of the system (11) defined over the triangle T is 

R = J J u t ‘= - J f(F x + G y ) dx dy = -S T [f* + G y ] (17) 

where F x , G y are some averaged values of the flux derivatives over the triangle T. 

Our construction of the truly two-dimensional Euler scheme utilizes the two- 
dimensional conservative linearization procedure [10]. We assume that the quantity 
which varies linearly over an element is the “parameter vector” 

m = v7>(l,tt,v, H) T (18) 

and its averaged value on the triangle T (as illustrated on Fig.l) is given by the 
following 

m 0 + m 3 + m 4 
m - 3 

Roe-averaged quantities can be introduced 

u = ih^lrhx 

v — mz/rhi (20) 

H = m 4 /mi 

and 

c 2 = ( 7 - 1 )[£- 1(« 2 + S 2 )]. ( 21 ) 

Fluctuations of the Euler system in the auxiliary variables can be presented as 



r — r x + r y , (22) 

where 


r x = -St A - {s x , pu x , pv x ,p x ) T 

r y = - s t b ■ ( 3 ^ ,pvr y ,pv~ y ,pj) T 


/ u 

0 

0 

0 ^ 


( 5 

0 

0 

0 \ 

0 

'll 

0 

1 

, B = 

0 

V 

0 

0 

0 

0 

u 

0 

0 

0 

V 

1 

V 0 

c 2 

0 

u } 


1 0 

0 

c 2 

v / 


and Sj = h 2 / 2 is the area of the triangle T, and 

pj = 2rh\(m\) x (23) 

pu x = fhi(m 2 ) x - rh 2 (mi) x (24) 

pv x = mi(m 3 ) r - m 3 (mi) x (25)' 

Px = l(m4{mi)x + dn(m 4 ) x ) + (m 2 (m 2 ) x + m 3 (m 3 ) x )}. (26) 

1 
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The corresponding terms involving derivatives in the y direction can be written in 
the analogous manner. 

Introducing the matrix 


/ 1 

0 

0 

1 / c 2 \ 

' u 

1 

0 

u/c 2 

V 

0 

1 

v/c 2 

(, (u 2 + v 2 )/2 

u 

V 

1/(7 - 1) + (u 2 + v 2 )/(2c 2 ) ) 


(27) 


we can define 

R x = C a r x 
R y = C a ry. 

It can be easily verified that _ 

R x = -S T F* 
R y = -S T G y , 


(28) 

(29) 


where F x ,G y are the same averaged flux derivative values as defined in [10]. It is 
also obvious that the entire fluctuation 


R = R x + R y = C a {r x + r y ) = C a r. 


(30) 


Consider triangle T as illustrated in Fig.l. The fluctuation is distributed according 
to the following formulae: 

Su% +1 =5< C a [r x (I - sign(i))] 

Su% +1 = Su% +zC a [r x (I + sign(A)) + r y (I - sign(jB))] (31) 

Su n 4 +1 =Su n 4 +iC a [r y {I + sign{B))} 

we obtain the scheme that is similar to the standard Roe dimensionally split scheme. 
The only difference is in the linearization procedure. 

We can construct now a (linearity preserving) second order accurate scheme. First, 
we shall introduce vectors r x , r y with their elements defined by 

rf = rf + $(qi)r? 

r f _ r v , v(<a) r x 

for i = 1,2, 3 , 4, where 



and ^ is a (non-compressive) limiter. 

Substituting r x ' ,r y * for r x ,r y in (31) we obtain a genuinely two-dimensional 
scheme, which is also linearity preserving (second order accurate in this case) and 
conservative. 

Some attributes and properties of the genuinely multidimensional schemes will be 
discussed later in [9]. In order to obtain an efficient implementation of the scheme 
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described above, it is important to write down the explicit expressions for the matrices 
sign(/l), sign(jB). Denote 

M x - sign (A) 

M y = sign (B). 

For matrix M r the distinction should be made between two cases 



M — { ’ 

x \ Mr, 

if \u\ < c 
if |£i| > c, 

(34) 

and similarly 

( jirsub 

y \ M sup , 

if |u| < c. 
if |u| > c, 

(35) 

where 





Mf up = sign («)/, 

(36) 


= sign (v)I 

(37) 


and I is the 4x4 unity matrix. These matrices for the subsonic case appear to be 
surprisingly simple as well 



( 

sign(ii) 

0 

0 


0 

\ 

M sub — 


0 

0 

0 

0 

0 

sign( 

u) 

1/c 

0 



\ 

0 

c 

0 


0 

/ 


/ 

sign (5) 


0 

0 

0 

\ 

M sub — 


0 

sign(u) 

0 

0 


m y 


0 


0 

0 

1/c 



V 

0 


0 

c 

0 

/ 


(38) 


(39) 


Their structure indicates that there are some intriguing similarities between the stan- 
dard schemes used for incompressible flow computations and the multidimensional 
upwind scheme presented above (see [9]). 

Remark 4. The scheme formidated here can be extended to the case of general 
unstructured grids in a straightforward way. Having a general triangular element, one 
has to introduce a new (possibly non-orthogonal) coordinate system whose axes align 
with two chosen faces of this element (Fig. 2). The Euler system has to be rewritten in 
these new coordinates. Then one can follow directly the procedure of constructing the 
fluctuation distribution formulae presented in this section (see [9] for more details). 


NUMERICAL EXPERIMENTS 


The purpose of the numerical experiments reported in this section is to verify the 
robustness of the constructed scheme and the quality of the numerical solutions ob- 
tained by its means. Some experiments illustrating the performance of the multigrid 
algorithm using this scheme are presented as well. 
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Supersonic flow in a channel with a bump 


The test case considered here is a supersonic (Mach=2.9) flow in a channel with a 
circular bump. The bump is located at the lower wall of the channel at 1 < x < 2, 
and its surface is a circular arch of ir/3 and radius 1. Note that the actual shape of 
the domain is a rectangle. The influence of the bump on the flow is imposed through 
the boundary conditions: the velocity component normal to the surface of the bump 
at a certain location is being reflected. 

The first experiment uses a grid of size 200 x 40 points. The density contour plots 
of the steady-state solution are presented on Fig.3(a). The scheme used is the one 
given by (31), (32) with the minmod limiter. 

The second experiment presented in Fig. 3(b) corresponds to the same settings, 
except that the grid is twice finer (400 x 80 points). As is expected, the grid refinement 
results in a better resolution of the flow features. 


Transonic flow over a circular bump 


The test case considered here is a transonic flow (free-stream Mach= .9) over a flat 
wall with a bump (Fig.4). The surface of the bump is a circular arch of 7r/3 and radius 
1 and its location is between 3.5 < x < 4.5. Again, in order to keep the experiments 
simple at this stage of work, the bump is treated the same way as in the previous 
experiments. The grid is 200 x 200 points. The shock of the “fish-tail” shape can be 
clearly observed in Fig.4. 

Low Mach number flow over a circular bump 

Here we present a numerical experiment concerning a low Mach number (=.l) flow 
over a flat wall with a circular (arch of 7t/3 and radius 2) bump. Here as well as in the 
previous case the presence of the bump is imitated through the appropriate boundary 
conditions. The grid is 200 x 200 points. The density contours of the steady-state 
solution are presented in Fig. 5. 


Multigrid algorithm 

To illustrate the performance of the multigrid algorithm we consider here the well 
known test case of a shock reflecting from a flat wall. The multigrid algorithm involves 
five grids (levels): the finest consists of 129 x 33 points, the coarsest is 9 x 3 points. 

The multigrid algorithm is based on the same two-dimensional scheme used with 
the lexicographic Gauss-Seidel relaxation. The restriction and prolongation proce- 
dures are the standard Full Weighting of the residuals and bilinear correction inter- 
polation. The numerical solution to this problem obtained by the 2 FMG — W( 2, 1) 
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algorithm is presented on Fig. 6(a). Fig. 6(b) presents the numerical solution obtained 
using the same algorithm but performing three more cycles (five total) on the finest 
level. 

Note that in this case the flow is aligned with the ^-direction in a significant 
part of the domain. In this case the artificial viscosity in the cross-stream direction 
in the entropy and u-momentum equations vanishes. Therefore, no smoothing can 
be obtained in the y-direction in some components. A multigrid algorithm utilizing 
the time-stepping type relaxation can deal with such a situation only using the semi- 
coarsening technique. Our algorithm employs the Gauss-Seidel relaxation. Therefore, 
it offers a much simpler and more efficient treatment of this problem: relaxation with 
lexicographic ordering in the stream direction. 

The rate of convergence observed in this test case as well as in other simple 
experiments concerning a variety of flow regimes is very close to .75. 

DISCUSSION AND FUTURE WORK 


Summary of the current work 

A new two-dimensional high-resolution (at the steady-state) scheme for the compress- 
ible Euler equations was presented. It is triangle-based and can be formulated with 
the same degree of simplicity both on structured and unstructured grids. The main 
advantage of this scheme is that Gauss-Seidel relaxation can be applied directly to 
the resulting discrete equations. This allows construction of a simple and efficient 
multigrid steady-state solver. 

A remarkable property of the constructed scheme is also its very compact stencil: 
it involves only the immediate neighbors of the point of interest. 

A variety of flow regimes (supersonic, transonic and low Mach number flow) were 
considered in the numerical experiments to verify the quality of the solutions ob- 
tained by means of the new scheme and to demonstrate the efficiency of the multigrid 
algorithm. 

Generalization of this scheme to three dimensional tetrahedral meshes is straight- 
forward (see [9]). 


Further improvement of the multigrid efficiency 

The main obstacle preventing the further improvement of the multigrid efficiency 
is the following fact: for the hyperbolic problems the coarse grid correction is not 
sufficient for certain error components. 

This difficulty was already addressed in the literature and some techniques to 
improve the multigrid efficiency were developed in [11]. Therefore, one possibility is 
to adapt these techniques for our case - compressible Euler equations. 
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Figure 4: Transonic flow over a wall with a circular bump (free stream Mach= .9). 
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ALGEBRAIC MULTIGRID BY SMOOTHED AGGREGATION 
FOR SECOND AND FOURTH ORDER ELLIPTIC PROBLEMS* 

PETR VANEK, JAN MANDEL, AND MARIAN BREZINAt 

Summary. An algebraic multigrid algorithm is developed based on prolongations by smoothed ag- 
gregation. Coarse levels are generated automatically. Guidelines for the selection of method components 
are presented based on energy considerations. Efficiency of the resulting algorithm is demonstrated by 
computational results. 

Key words. Algebraic multigrid, unstructured meshes, automatic coarsening, biharmonic equation 

AMS(MOS) subject classifications. 65N55, 65F10 

1. Introduction. Multigrid methods are very efficient iterative solvers for systems 
of algebraic equations arising from finite element and finite difference discretizations 
of elliptic boundary value problems. The main principle of multigrid methods is to 
complement the local exchange of information in point-wise iterative methods by a global 
one utilizing several related systems, called coarse levels, with a smaller number of 
variables. The coarse levels are often obtained as a hierarchy of discretizations with 
different characteristic meshsizes, but this requires that the discretization is controlled 
by the iterative method. To solve linear systems produced by existing finite element 
software, one needs to create an artificial hierarchy of coarse problems. The principal 
issue is then to obtain computational complexity and approximation properties similar 
to those for nested meshes, using only information in the matrix of the system and as 
little extra information as possible. 

Such algebraic multigrid method that uses the system matrix only was developed 
by Ruge, et al. [10, 4, 11]. The prolongations were based on the matrix of the system 
by partial solution from given values at selected coarse points [1]. The coarse grid 
points were selected so that each point would be interpolated to via so-called strong 
connections. 

Our approach is based on smoothed aggregation introduced recently by Vanek [14, 
13]. First the set of nodes is decomposed into small mutually disjoint subsets. A tent- 
ative piecewise constant interpolation (in the discrete sense) is then defined on those 
subsets as piecewise constant for second order problems, and piecewise linear for fourth 
order problems. The prolongation operator is then obtained by smoothing the output of 
the tentative prolongation and coarse level operators are defined variationally. Multigrid 
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method based on such prolongations converges very fast for a wide range of problems 
including those with strongly anisotropic and discontinuous coefficients and, in addition, 
it has a remarkably low computational complexity since the typical coarsening ratio is 
about three in each dimensiqn. 

Almost optimal theoretical bounds for our method were given by the authors in [15] 
for second order problems and under natural assumptions on the coarse level hierarchy 
that tend to be satisfied by our coarsening algorithm, namely that the coarsening is by 
about the factor of three, and that the aggregates of the nodes are based on aggregated 
elements that form a reasonable mesh of macroelements. A bound on the energy of 
the coarse level basis functions was proved and used to verify the assumptions of the 
multilevel regularity-free approach of Bramble, Pasciak, Wang, and Xu [3]. The theory 
can be extended to fourth order problems once similar energy bounds are available for 
that case. 

The part of this paper dealing with second order problems is based on [15]. The 
algorithm for fourth order problems is new. For more details and theory for the second 
order case, see [15]. 

For other multigrid approaches to the biharmonic equation, see [5, 9, 16, 8]. For 
a multigrid theory for the biharmonic equation with non-nested finite element spaces, 
see [2]. 

1.1. Basic Multigrid Algorithm. For reference, we state the basic multigrid 
algorithm for the solution of the system of linear algebraic equations Ax = b. First, 
a preprocessing stage creates full rank prolongation matrices Pi of size nt x n/ + i, l = 
1,...,L — 1 by an automatic coarsening process described below. The coarse level 
matrices are defined by 

A\ = A, Ai + 1 = Pj AiPi,l = 1, . . . , L — 1. 

The iterations then proceed as follows. 

Algorithm 1 (Basic multigrid). To solve the system A\x l = b l , do: 
Pre-smoothing: do iq times x l <r~ S l (x l , b l ) 

Coarse grid correction: 

• let b l+1 <—P^(b l — Aix l ) 

• If l + 1 = L, solve Ai +i x l+1 = b I+1 by a direct method, otherwise apply 
7 iterations of this algorithm on level l + 1, starting with initial guess 
x l+1 = 0 

• correct the solution on level l by x l <—x l + Pix l+1 
Post-smoothing: do times x l <r-S l {x l ,b l ). 

We use ui = 1/2 = 7 = 1 with the pre-smoothing iteration consisting of one forward 
iteration of the Gauss-Seidel followed by one iteration of backward SOR. The post- 
smoothing iteration consists of one forward SOR iteration followed by an iteration of 
backward Gauss-Seidel. The over-relaxation parameter used is 1.85 in both pre- and 
post-smoothing. 
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Each level is associated with basis functions {<£>;}r=i- The basis functions on the 
finest level are given as finite element shape functions, while the coarse level basis 
functions are determined from the prolongations by 


k = 1, . . . , L 


1 


2. Algebraic Multigrid for Second Order Problems. Consider discretization 
by standard conforming linear finite elements of a second order elliptic variational prob- 
lem 

u£V: a(u,v) = f(v ) Vv € V (2.1) 

where V = H^ D (Q,) denotes the Sobolev space of H 1 functions vanishing on r^Cdf), 
^(Fd) > cfi(d f2), 0, a domain in IR 2 . The bilinear form 

a(u , v) = f y~] aijdiudjV ( 2 . 2 ) 

Ja 

is assumed to be symmetric, V-elliptic, and bounded, 

cilMItfqu) — o>(u,u) < C 2 ^ u € V. (2-3) 

Moreover we assume that the finite element basis forms a decomposition of unity 


£W = i 

1=1 


away from essential boundary conditions. 


2.1. Construction of Prolongations for second order elliptic problems. 
The prolongation operators are chosen to achieve low energy of coarse basis functions, 
leading to good theoretical estimates of the convergence of the iterations, as well as by 
sparsity considerations to achieve low computational complexity of the iterations. We 
are looking for prolongations that satisfy the following properties. First we specify the 
desired properties of the support of the coarse shape functions (or, equivalently, the 
allowed nonzeros of the prolongation matrices), and then the numerical values of the 
nonzero entries. 

(AMG1) Coarse supports should follow strong couplings. We require that every 
two nodes in the support of a coarse basis function can be connected by a path 
of strong couplings. Two nodes i and j on level / are strongly coupled if |aF| 

is relatively large compared with Essentially, we want to assure that 

the algorithm will provide the semi-coarsening in the case of solving of the an- 
isotropic problem ( [6], [12] ). Algebraically, the anisotropy is reflected in the 
coefficients of the stiffness matrix in the sense that the neighboring nodes are 
strongly coupled in the direction of anisotropy. 
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(AMG2) Bounded intersection. Support of each basis function intersects a bounded 
number of supports of other basis functions on the same level only. The number 
of intersections does not depend on the level. This property guarantees sparsity 
of the resulting coarse-level matrices. 

(AMG3) Decomposition of unity. Every coarse space Vi should represent the con- 
stant function exactly, aside from an essential boundary condition. This re- 
quirement is motivated by the need to bound locally the error of a coarse grid 
approximation Pjv l+1 of a fine grid function u l in terms of the energy (u l ) T Aiu 1 
and by the fact that the constant function has zero energy because of (2.2). 
Because of (2.4), this is equivalent to the requirement that the columns of each 
prolongation matrix form a decomposition of unity 

»!+l 

Y, p ‘i = l < i = 1 £-!. 

j=i 

for all rows i that do not correspond to degrees of freedom adjacent to an 
essential boundary condition. For generalizations, see Sections 3.1 and 3.3. 
(AMG4) Small energy of coarse basis functions. We require that the energy of 
the coarse space basis functions be almost minimal in the sense that 

¥>') ^ :_r «K U ) 

1 1 ‘r’; llz, 2 (ft) uetf^supp^) INIz, 2 (n) 

Note that in the case of uniformly V-elliptic problems the requirement above, 
together with bounded intersections of supports of basis functions (AMG2), 
assures the standard inverse inequality on each coarse space. 

(AMG5) Uniform l 2 equivalence. Discrete l 2 norms on all spaces V/ should be uni- 
formly equivalent up to diagonal scaling. The scaling may depend on the meas- 
ure of the support of basis function and type of degree of freedom. For the 
algorithm described in this section, such uniform equivalence has been proved 
in [15]. 

We now construct prolongations Pi based on the matrix Ai. First we create a 
tentative piecewise constant prolongator satisfying all of the above properties except 
for the energy bound in (AMG4). This prolongator will then be smoothed to satisfy 
(AMG4), while preserving the other properties. 

We start by specifying a disjoint decomposition of the set of nodes on level l. Every 
component of the decomposition on level l ( so-called aggregate ) gives rise to one degree 
of freedom on level / T 1. 

Motivated by the requirement (AMGl) above, for a given e we define the strongly- 
coupled neighborhood of node i as 

N\{e) = {j : \aij\ > e^/a tl a 33 ) U {?'} (2.5) 


724 



Algorithm 2 (Aggregation). Let the matrix Ai of order ni and e e [0,1) be 

given. Generate a disjoint covering of the set {1, . . . , n/} as follows. 

Initialization Set R = {1, , n/} and j = 0. 

Step 1 Select disjoint strongly coupled neighborhoods as the initial attempted cover- 
ing: If there exists a strongly coupled neighborhood N-(e) C R, set jt—j + 1, 
C l j<—N-(£), R*—R \ Cj. Repeat until R does not contain any strongly coupled 
neighborhood. 

Step 2 Add each remaining i £ R to one of the sets already selected to which it is 
strongly connected, if possible: 

Copy C l k = Cl, k = l,...,j 

If there exists i £ R and k such that N-(e ) D C k ^ 0 then set C k i-C k U {*}. 

Repeat until no such i exists. 

Step 3 Make the remaining i £ R into aggregates that consist of subsets of strongly 
coupled neighborhoods: If there exists i £ R, set jt—j + 1 and Cj = Rf I N‘(e). 
Repeat until R = 0. 

Define the tentative prolongation P/ by the aggregates Cj: 


(Pi)* = 


1 if * £ C\ 

0 otherwise 


(2.6) 


The piecewise constant prolongation Pi will now be improved by a smoothing to 
get the final prolongation matrix Pi. We choose a simple Jacobi smoother, giving the 
prolongation matrix 

Pi = (J -uD- l Af)Pi 

where Af («J) is the filtered matrix given by 

nj 

if * ~f~ J, “ii = au- XJ ( a b ~ 

and D denotes the diagonal of Af. 

When applying Algorithm 2 to uniformly elliptic problems, one usually obtains the 
coarsening by about a factor of 3 in each dimension and the resulting coarse level matrix 
Ai + 1 tends to follow the nonzero pattern of the 9-point stencil. The filtration (2.8) has 
little or no effect in this case. 

In the case of anisotropic problems, however, the application of the smoother with 
the unfiltered matrix would make the supports of basis functions overlap extensively in 
the direction of weak connections. Here the filtration prevents the undesired overlaps 
of the coarse space basis functions. By construction, Af typically makes the nonzero 
pattern of Ai+\ follow the 9-point stencil as in the uniformly V-elliptic case. It also 
assures that a constant remains the local kernel of Af at every point where constant 



F = f ^ 3 € N\{e) ) 
u ( 0 otherwise J 


(2.7) 
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Fig. 2.1. The basis functions given by aggregation and the corresponding smoothed basis 
for ID Laplacian, using the smoother I — 2/3 D -1 A. 


is the local kernel of A/. Consequently, for problems without zero-order term the final 
prolongator Pi satisfies the decomposition of unity away from the essential boundary 
conditions. 

Fig. 2.1 shows the ID coarse basis functions resulting from prolongation by aggrega- 
tion and the smoothed aggregation. Note that for the ID Laplace operator and the choice 
of u — 2/3 in (2.7), the smoothed coarse space basis is exactly the one of PI -finite 
elements. Fig. 2.2 shows the typical aggregates obtained on an unstructured grid. The 
corresponding supports are formed by adding one belt of elements to the aggregates. 
The smoothing adds at most one more belt of adjacent elements. 

We choose 


£ = 0 . 08 (i)'-\ u,= |. 

The theory for the above method can be found in [15]. 

3. Generalizations. 

3.1. High order elements and unsealed problems. The decomposition of 
unity (2.4) may be violated in practice. In such a case, in order to construct coarse 
spaces representing the constant function exactly, we need the representation of unity 
with respect to finite element basis of finest space Vi as user input data. More specific- 
ally, we need the vector a £ IR ni satisfying 


= 1 


away from essential boundary conditions. 
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Fig. 2.2. Typical 2D aggregates. 


The definition (2.6) of the auxiliary prolongators remains in place for all levels but 
level 1; we define Pi as 



ai if i € Cj 

0 otherwise 


(3.1) 


Thus, the unit constant function is represented by the vector a = ( a.i)i=i on lh e finest 
level, while on levels 2 to L, the unit constant function is represented by vectors of 
all ones. The process can be easily generalized to the nonscalar case using the block 
approach described in Section 3.2. It was applied to the problem from Example No. 1 
of Section 5 modified by scaling the basis functions randomly in the interval [0.01,1]. 
The results are summed up in Example No. 5. 


3.2. Vector problems. In the case of nonscalar problems, the coarsening al- 
gorithm as described in Section 2 is likely to produce aggregates of physically incom- 
patible degrees of freedom causing deterioration of convergence. This phenomenon can, 
however, be overcome by using so-called block approach, which consists in replacing the 
scalar operations on the level of degree of freedom by their block counterparts on the 
level of node. Let rid denote the number of degrees of freedom per node ( assumed to be 
constant ) and df(i) be the list of degrees of freedom associated with the node i. The 
communication between the neighboring nodes k, l can now be expressed in the form of 
a matrix selection Am of order rid 


Am = A(df(k),df(l)). (3.2) 

The definition of strongly coupled neighborhood of node i (2.5) is now replaced by 

= u {■}, ( 3 -3) 
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where ||.|| is a matrix norm. Further, in the definition of auxiliary prolongations (2.6), 
we replace the numbers 1 and 0 by identity and zero matrices of order n<i, respectively. 
The efficiency of this generalization is demonstrated by Experiments No. 1 and No. 5 
in Section 5. 

3.3. Absolute term. Consider now (2.1) modified by adding a positive absolute 
term 

a(u, v) = I dijdiudjV + quv, q > 0. 

Ja iJ 

In this case, the prolongation smoothers lose its constant-preserving property because 
the constant is no longer locally in the kernel of Ai . Fortunately, the presence of the 
absolute term improves the condition number of Ai, thus compensating for the loss of 
the preservation of a constant. 

For large q , the absolute term also has the effect of boosting the diagonal dominance 
in certain (block) columns. The nodes corresponding to these columns are then treated 
by Algorithm 2 as isolated nodes, and the coarsening process may stall. Note that the 
same phenomenon may also result from certain treatments of the essential boundary 
conditions. This difficulty can easily be defeated by a simple modification. Removing 
these nodes from the set R in Algorithm 2 prevents the stalling. At the same time, it 
does not harm the convergence of the overall method, because the smoothers S l are very 
efficient at approximating values in numerically isolated nodes. 

4. Method for High order problems. For the elliptic problems of order 2 K, K > 
1 requirements on prolongators have to be slightly stronger. Instead of decomposition 
of unity (AMG3) we now need the more general requirement. 

(AMG3’) Every coarse space Vi must represent polynomials of degrees up to I\ — 1 
exactly, away from the essential boundary conditions. As in the case of second order 
problems, this requirement is motivated by the need to control the coarse-grid approx- 
imation of Piv l+1 of u l by energy ( u l ) T AiUi and by the fact that norm and seminorm are 
equivalent on the factor space H K modulo polynomials of degree of up to K — 1. 

Second, the small energy of coarse basis functions (AMG4) must be replaced by 
its straightforward generalization. 

(AMG4’) We require that the energy of coarse space basis functions be almost 
minimal in the sense that 

I Iv 5 ; 1 1 1,2 (ft) uSHfi suppc^l) ll u llL 2 (ft) 

Unfortunately, the construction of prolongators resulting in the coarse spaces satis- 
fying (AMG3 5 ) for K > 1 is not possible without additional user input. In order to be 
able to approximate the polynomials with degrees of up to K — 1 by coarse space func- 
tions exactly, we need their representation with respect to the finest level basis 
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Finally, assumption (AMG5) may be satisfied with different scaling for each type of 
degree of freedom. 

For the elliptic problem of order 2 K on the domain (l C IR d , we need vectors 
p(°),pW) e 1R” 1 , i = 1 . . . , K — 1, j = 1,. . .d satisfying 

7l\ n i 

= 1, f^Pk J)( Pk = x) (4.1) 

fc=i fc=i 

away from the essential boundary conditions. For example, to solve the biharmonic 
equation in 2D, we need j/ 11 ), and p^ 12 \ the representations with respect to the 
fine-level basis of the planes z = 1, z = x, z = y, respectively. 

The coarsening technique we are using is a natural generalization of the concept 
of smoothed aggregation described in Section 2.1. The aggregation step (2.6) can be 
viewed as a restriction of the unit vector to aggregates Cj, which gives rise to one degree 
of freedom on the level /+1 for each Cj. Here, tentative prolongators will be generated by 
restricting all the vectors p°, p? k to the aggregates Cj. Each aggregate will be represented 
by a set of degrees of freedom, where every degree of freedom corresponds to one of 
the vectors p(°\ p^ k l (see Fig. 4.1). The shape of the basis functions derived from the 
nonconstant polynomials depends on the position of the aggregate. More specifically, 
being far away from the origin, basis functions derived from polynomials of higher degree 
contain a large low degree polynomial component which results in the violation of the 
uniform equivalence of discrete and continuous L? — norms. This undesirable effect is 
suppressed by a local / 2 Gram-Schmidt orthogonalization process performed on each 
aggregate C} (see Fig. 4.2). Again, the resulting prolongator will be smoothed by the 
Jacobi smoother (see Fig. 4.3). 



Fig. 4.1. The coarse-space basis given by the restriction of p° and p 11 onto aggregates of nodes. 

The following is a generalization of the algorithm of Section 2.1 to the case of 
problems of order 2 K, K > 1. 

Algorithm 3 (Coarsening of high-order problems). We assume the num- 
ber of degrees of freedom per node on the finest level to be constant. Let n\ be the number 
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Fig. 4 . 2 . The coarse-space basis after I 2 Gram-Schmidt modification. 

y= l 



of nodes on the finest level, and df l {i) denotes the list of degrees of freedom associated 
with the node i. We set p 1 ^ 0 ) = = p^\ i = 1, . . . , K — 1, j = 1, . . . , d ( see 

(4-W- 

Step 1 - Decomposition. Generate the disjoint covering of the set of nodes 

{1, . . . , re; } using the Algorithm 2, where the strongly coupled neighborhood of 
i is defined by (3.3) and A(j is the selection A l (df l (i),df l (j)). 

Step 2 - Restriction. For each aggregate Cl define the index set D\ of all degrees of 
freedom associated with nodes in C\, i.e. 

D\ = U <V‘U)- 

iec‘ 


For every D\ generate auxiliary sparse vectors , v l ' t,np by 


J,i, 1 


P 1 ' {0) \d‘, v 


l,i, 2 


P 1,{1,1) \d‘, 


J,i,3 


P l ’ {h2) u 


,,v 




P l ^- hd) u 


where 2K is the order of equation, d is the number of space variables (T2cIR rf / ), 
and n p — ( K — l)d + 1 is the number of the user supplied polynomials. v\i 
denotes the restriction of the vector to the index set in the sense that = V{ 

if i £ I, zero otherwise. 

Step 3 - Gram-Schmidt modification. For each aggregate C\ update the set of as- 
sociated sparse vectors generated in Step 2 by I 2 Gram-Schmidt orthogonaliza- 
tion process in the ordering v l,hl , v l ' h2 ,..., v l,hnp ( i.e., vectors derived from 
low-degree polynomials are processed first ). Note that the representation of the 
unity v l,hl remains unchanged by the process. 

Step 4 - Building of auxiliary prolongators. Generate the auxiliary prolongator P\ 
whose n p (i — 1) + j-th column consists of the vector v l,lJ and create the corres- 
ponding coarse-level list of degrees of freedom associated with node i 


df l+1 (i) = {n p (i- 1) + 1, n p (i — 1) + 2, , n p i}, i = 1, . . . , re/ +1 . 
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Step 5 - Representation of polynomials on the coarse-level. Generate vectors 

p/+i,(o)^p/+i,(ii)^ . sa ti s fyi n g 

p i,(o) _ p ip i+ 1,(0) ^/.(u) _ ^ /+i,( u) ? ? p UK-i,d) _ p lp i+h{K-i,d)' 

As {C-} is a disjoint covering, the columns corresponding to different aggregates 
are I 2 - orthogonal and consequently, the global Gram matrix given by columns 
of Pi is a block matrix. Therefore, p i+1 ’(°), p i+1, ( u ), . . . can be com- 

puted by solving the local problems with Gram matrices generated by the columns 
of prolongator Pi associated with Cl 

G! = 

Step 6 - Final smoothing. Improve the prolongator Pi by smoothing step (2.7), (2.8), 
where scalar entries aij are replaced by blocks = A l (df l (i),df l (j)). 

REMARK 4.1. Note that the final smoothed coarse basis functions resemble the 
standard shape functions for the Hermitean element with one degree of freedom for the 
value at the node and one degree of freedom for each derivative. This is true regardless 
of the choice of basis functions in the original problem (finest level), and makes an 
algebraic coarsening possible. 

For the results of application of Algorithm 3, see Experiments 6 and 7 in Section 5. 

REMARK 4.2. Efficient solution in the case of nonscalar problems of second order 
may also need the use of the coarsenig technique described in this section. For example, 
in the case of 3D elasticity, the energy norm is not equivalent to (i7 1 ) 3 -seminorm on the 
factorspace modulo constant in each field in the local sense, and consequently, the ap- 
proximation property of the coarse space depends on the global constant of V-ellipticity, 
which can be very small if, for example, displacements are prescribed only on a rather 
small part of the boundary. 

In order to eliminate the dependence of the convergence on boundary conditions, 
we need the prolongator to support the local kernels of the form, which will typically 
assure the desired local equivalence on the factorspace modulo kernel ( i.e., local Korn’s 
inequality on macroelements ). 

Thus, it is reasonable to build prolongators supporting the entire local kernels of the 
bilinear form instead of just a constant in each field. This can be achieved by supplying 

the representation of the basis vectors of the kernel in place of the vectors pl 0 ),^ 11 ) 

A similar technique that builds the coarse space from local generators of the nullspace 
is used in the so-called Balancing Domain Decomposition [7]. 

5. Numerical Experiments . The experiments in this section demonstrate the 
favorable behavior of the method. The code is available through anonymous ftp to 
tiger.denver.colorado.edu , directory /pub/faculty/pvanek. The experiments were 
performed on an IBM RS-6000/360 with 128 MBytes of memory. 
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experiment No. 

rate of convergence 

algebraic complexity 

CPU time 

real time 

1 

0.08 

1.23 

5s 

5s 

2 

0.10 

1.56 

768s 

7892s 

3 

0.21 

1.14 

134s 

233s 

4 a/b/c 

0.11/0.10/0.10 

1.65/1.65/1.65 

85/85/85s 

95/96/91s 

5 

0.09 

1.24 

13s 

13s 

6 

0.26 

1.37 

64s 

77s 

7 

0.31 

1.48 

114s 

121s 


Table 5.1 

Results of numerical experiments. 


The residual was measured in the l 2 norm. The iteration process was stopped once 
the relative residual became smaller than 10~ 5 . In all the experiments 1/(1, 1) cycle 
has been used. By algebraic complexity we mean the number of nonzero entries in the 
matrices on all the levels divided by the number of nonzeros in the matrix on finest level. 

The rate of convergence is computed as an average reduction of / 2 -norm of residual 
per iteration. 

Results of experiments are summed up in Table 5.1. The description of testing 
problems follows. 

EXPERIMENT No. 1: Planar elasticity on unstructured mesh (Fig. 5.1). Poisson 
ratio 0.3, number of nodes 10610, number of degrees of freedom 21358. Boundary 
conditions : Dirichlet and Neumann. 

EXPERIMENT No. 2: Large anisotropic problem (5.1) with jumps in coefficients as 
in Fig. 5.2 and q(x , y) = 0. Number of nodes 10 6 . The problem has been discretized on 
the regular square grid. 

EXPERIMENT No. 3: 3D problem (5.2) with random coefficients 

u;n = exp(rni), w 2 2 = exp(rn 2 ), w 33 = exp (rn 3 ), 

where rn, is a random number uniformly distributed in the interval [ln(10 — 2 ), ln(10 2 )]. 
Number of nodes 68921. The problem was discretized on the regular square grid. 

- ~5y + ^ x ->y) u = f( x ^y) on (° 5 1) x (0, 1), _ _ 

ii = 0 on dQ,. 


£?,j= 1 Jfc( w ij( x >y)xf) = f{x,y), on (0,1) x (0,1), 


, du 


u 


0 on dfl. 


(5.2) 
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Fig. 5.1. Mesh 1 ( Courtesy of Charbel Farhat, Center for Aerospace Engineering, University of 
Colorado, Boulder). 


EXPERIMENT No. 4: 2D anisotropic problem (5.1) with jumps in coefficients as in 
Fig. 5.2 and a) q(x,y) = 0.1, b) q(x,y) = 1, c) q(x,y) = 10. Number of nodes 160000. 
The problem was discretized on the regular square grid. 

EXPERIMENT No. 5: Planar elasticity on an unstructured mesh (Fig. 5.1) dis- 
cretized by finite elements with randomly scaled basis. Poisson ratio 0.3, number of 
nodes 10610, number of degrees of freedom 21358. Boundary conditions : Dirichlet and 
Neumann. 

EXPERIMENT No. 6: Biharmonic problem discretized on the rectangular square 
grid. Number of degrees of freedom 48400. Boundary conditions: essential. 

EXPERIMENT No. 7: Fourth order problem (5.3) with coefficients given by (5.4) 
discretized on regular square grid. Number of degrees of freedom 48400. Boundary 
conditions: essential. 


d 2 
dx 2 




dx 2 



f(x,y) on (0,1) x (0,1) 


(5.3) 
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a = 1(T 2 


b= 10 2 

a = 10 2 

a = 1 

b = io- 2 

6=1 



Fig. 5.2. The coefficients a(x , y ) , b(x,y). 


a(x,y) = 1, b{x,y) = e 16xy (5.4) 

The second order problems are discretized by PI finite elements. The fourth order 
problems are discretized by a 27-point difference formula with Lagrangean degrees of 
freedom. 
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Abstract 

We consider numerical solution methods for the incompressible Navier- 
Stokes equations discretized by a finite volume method on staggered grids in 
general coordinates. We use Krylov subspace and multigrid methods as well 
as their combinations. Numerical experiments are carried out on a scalar and 
a vector computer. Robustness and efficiency of these methods are studied. 
It appears that good methods result from suitable combinations of GCR and 
multigrid methods. 


1 Introduction 


We compare various iterative methods for linear systems resulting from discretization 
of the time-dependent incompressible Navier-Stokes equations. Before discretization 
the physical domain is mapped onto a computational domain consisting of a number 
of rectangular blocks. In this paper we restrict ourselves to the one-block case and 
two space dimensions. For the space discretization we use finite volumes and a stag- 
gered grid. For the time discretization we use the Euler Backward finite difference 
scheme together with pressure correction. 

Krylov subspace and multigrid methods are two types of promising iterative meth- 
ods for the solution of large unsymmetric non-diagonally dominant linear systems 
of algebraic equations. These types of methods are much used to solve discretized 
Navier-Stokes equations. Our research using Krylov subspace methods is described in 
([10], [11], [12]) and using multigrid methods is described in ([14], [15], [16], [4] - [6]). 
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As Krylov subspace method we choose the GMRESR method [9] (a combination of 
GCR [1] and GMRES [7]). For the multigrid method we use a Galerkin coarse grid 
approximation and two different smoothers. 

Since many of the faster computers are vector computers, we also compare the vec- 
torization properties of the different methods. Although probably in the near fu- 
ture parallel computers will supersede vector computers, the comparison will remain 
relevant because good vectorization properties imply in many cases good parallel- 
lization properties. Furthermore, vectorization aspects remain of interest because 
future high-performance parallel computing platforms will often contain vector pro- 
cessors. Finally, good vectorization normally implies good superscalar- performance 
on many RISC processors. Note that GMRESR is easy to vectorize, since most of its 
arithmetic operations are vector updates, vector- vector and matrix- vector operations. 
Vector length becomes large as the grid is refined, which improves speed on vector 
computers. With respect to multigrid we have the following choices: 

- use of a simple smoother, like point Jacobi, which is easily vectorized but not 
robust, or 

- use of a more complicated smoother, like ILU, which is robust but harder to 
vectorize. 

A disadvantage of multigrid methods is that the occurrence of vectors of short length 
is inevitable, since use of coarse grids is necessary. This diminishes multigrid efficiency 
on vector computers. 

The foregoing observations on the advantages and disadvantages of the two types 
of methods suggest that combinations of them may be profitable. We compare the 
following methods: 

Method 1: GMRESR with ILU preconditioning; 

Method 2: Multigrid with Jacobi line smoothing; 

Method 3: Multigrid with ILU smoothing; 

Method 4: GCR with Method 2 as inner loop; 

Method 5: GCR with Method 3 as inner loop. 

In this paper, general boundary-fitted coordinates are used to compute flows in com- 
plicated geometries. In general coordinates, the incompressible Navier-Stokes equa- 
tions are formulated in standard tensor notation as follows [8]: 

dU a 

momentum equations — ^ — |- = —g a<3 p,p + Re~ 1 (g /3l U ^ + g a,3 Wp) 

continuity equation U a a = 0, (2) 

where U a is the contravariant representation of the velocity vector field, p the pressure, 
Re the Reynolds number, and g al 3 the metric tensor. The range of Greek indices is 
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{1, 2}. We use a staggered grid arrangement and a lexicographic ordering of the grid 
points. Due to the use of virtual cells the number of u 1 -, u 2 -, and p-points is the 
same. Using finite volume discretization in space and the backward Euler method for 
time discretization, we obtain the following discrete systems at each time step (see 
[8] for details): 


1 

A t 



A 11 A 12 A 13 
A 21 A 22 A 23 


A 31 



71+1 

= 0, 


( u 1 \ 

u 2 

Vp / 


n+1 


(3) 


(4) 


where u 1 , u 2 and p are algebraic vectors that approximate on the grid yfgU x and 
s/gU 2 and p, respectively, with sjg the Jacobian of the mapping, and f 1 and f 2 repre- 
sent source terms. The nonlinear terms have been linearized with Newton’s method. 
The linear operators (A 31 A 32 ), resulting from discretization of the divergence oper- 
ator in the continuity equation, and A 13 and A 23 , resulting from discretization of the 
gradients of the pressure in the momentum equations, do not depend on time. The 
remaining operators are time-dependent. 


Equations (3) and (4) are solved by the pressure correction method, as presented 
in [3], which consists of three steps. In the first step, the momentum equations are 
solved to give an intermediate value for the velocities, using the old pressure: 


2*1 + A 11 

A 21 


A 12 

^I + A 22 



/ \ n+1 / _ \ n / 

/ fl \ _1_ / u 1 \ _ / A 13 

[ f 2 ) ^ At y u 2 J [ A 23 


P n - (5) 


This equation system behaves like a discretization of a convection- diffusion equation. 
The main diagonal is enhanced by a contribution 1/A t due to the time-derivative. 
Then the pressure equation, which is derived from the momentum equation (3) and 
the continuity equation (4), is solved to give the difference p n+1 — p n : 

(A 31 A 32 ) ( £3 ) (P” +1 - P”) = ~Xt ( A 31 A 33 )("^ (6) 


The coefficient matrix of p n+1 — p n does not change with time, and resembles a 
discretization of the Laplacian operator (in general coordinates), but is not symmetric. 
Finally, the velocities at time step n + 1 are computed by means of 


IT 


u 




(7) 


In the next section we describe the iterative methods used for the solution of (5) and 

( 6 ). 
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2 Solution Methods 


In this section the iterative methods to be tested are described. The GMRESR 
method combined with ILU type preconditioners is given in Subsection 2.1. This 
is a summary of the methods described in [12]. In Subsections 2.2.1 and 2.2.2, the 
multigrid methods using an alternating Jacobi line smoothing and an ILU smoothing 
are presented. New methods, consisting of combinations of GMRESR and multigrid, 
are proposed in Subsection 2.2.3. 


2.1 Method 1: GMRESR with ILU preconditioning 


In Section 1 we have seen that there are two types of linear systems to be solved: 
the momentum equations and the pressure equation. Each has its own characteristic 
properties. We use GMRESR for both but with different preconditioners. The GM- 
RESR method is defined in [9], successfully applied to the Navier-Stokes equations 
in [11], and analysed further in [10]. The GMRESR algorithm can be formulated as 
follows: 


Algorithm GMRESR 
r 0 = b — Ax 0 , k = — 1 
while ||r* + i||/||r 0 || > tol do 
k := k + 1 

apply one iteration of GMRES(m) to Ay*, 
denote the result by 


rfc and 


,(°) 

'-k 

for 


Au 


(o) 


0, 1 , • • • , k — 1 do 


OLi 


tJS) 


C7 C 


.b+i) _ 


= cl 


end do 

(k) 

c k = c\ 


(i+i) 

CXiCi] 


u 


(0 


- O.U, 


,(*)l 


(k) 

U* =Uk I 


,(*)l 


XA, +1 = Xfc + UfcCjf r*,; r*, + i 

end while 




Ckcl?k 


GMRESR consists of a GCR outer loop and a GMRES inner loop. In every outer 
iteration, m iterations are used in the GMRES inner loop. Only in the final outer 
iteration it is possible to do less than m inner iterations (see [9]). In this paper the 
GMRESR algorithm is used with the l min alfa ’ truncation strategy (see [10]). A 
truncation strategy is necessary to restrict the required memory. Truncation means 
the following: choose the number ( ntrunc ) of search directions (u*) that may be kept 
in memory. If the number of iterations becomes larger than ntrunc , a search direc- 
tion u j and its companion c j(= Am, ) are overwritten by the new search direction 
Ujfc+i and Cfc+i. The min alfa truncation strategy is a method to decide which search 
direction should be discarded by the following criterion: find j such that aj = cjc^ x 
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satisfies the following equation: 


ctj\ = min | or,- 

0<.i<ntrunc 


( 8 ) 


To obtain an efficient solver, GMRESR is combined with a preconditioner. For the 
pressure equation we usethe classic incomplete LU decomposition (all fill-in is ne- 
glected). For the details of this preconditioner and the combination with GMRESR 
we refer to [12]. We use an ILUD preconditioning for the moment um equations. In 
this type of preconditioning the off-diagonal parts of L and U are the same as that of 
the given matrix and only the diagonal is adapted. In all the numerical experiments 
given in Section 3, we use the GMRESR(5) method (so m = 5). 


2.2 Multigrid methods 


In this paper we use multigrid methods consisting of the F-cycle with one pre- and one 
post-smoothing. In Subsection 2.2.1 the coarse grid operators are defined. The two 
smoothing operators used are given in Subsection 2.2.2, corresponding to Methods 2 
and 3. In Subsection 2.2.3 the combined methods are given. 


2.2.1 Formulation of coarse grid operators 


Coarse grid operators are formulated by means of Galerkin coarse grid approximation 
[13]. For brevity, we write equations (5) and (6) as 


( A 11 A 12 W u 1 \ _ / f 1 \ 
\A 21 A 22 j [ u 2 ) ~ [ f 2 ) ’ 

A 33 p = f 3 . 


(9) 

(10) 


Let / be the grid index, with 1=1 indicating the coarsest grid. Galerkin coarse grid 
approximation is carried out from grid l + 1 to grid / as follows: 
momentum equations 


/ A ll(/) A 12(/) 

I a 21 « a 22 ^ 

/ fi(0 
[ f 2 (') 


/ RiA^'+^P 1 
\ R 2 A 21 ( ,+1 )p 1 
/ Rl r l(i+1) \ 

y R 2 r 2(Z+l) J 


R1 A 12('+1)P 2 \ 

R 2 A 22(Z+l)p2 J ’ 


( 11 ) 


and pressure equation 

A 33 (0 = R 3 A 33('+Dp3 5 f 3(0 = R3 r 3(Z+l), ( 12 ) 


The r’s are the residuals, for example, r 3 = f 3 — A 33 p. Here the R’s and P’s are 
restriction operators and prolongation operators, which are described below. 
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Standard cell-centered coarsening is used: a cell on the next coarse grid is formed by 
taking the union of four fine grid cells. The restriction operators R 1 and R 2 are for the 
momentum equations and R 3 for the pressure equation. The prolongation operators 
P 1 , P 2 and P 3 are applied to u 1 , u 2 and p, respectively. The prolongation used for 
the coarse grid corrections is the same as in Galerkin coarse grid approximation. 


The operators R 1 and R 2 use so-called hybrid interpolation, which, for example for 
R 1 , is obtained by using the adjoint of linear interpolation for u 1 in direction 1 but 
the adjoint of piecewise constant interpolation in direction 2. Operator R 3 is simply 
the adjoint of piecewise constant interpolation. Operators R 1 and R 3 are given by 


fit 1 ] = 1 

we 

2 

we 

, [r 3 1 

_ 1 

’ 1 

1 ' 

L J 2 

we 

2 

we 

’ L J 

“ 2 

1 

1 


(13) 


where w = 0 when the ‘west’ points are on or outside of the ‘west’ boundary and 
w = 1 elsewhere, and similarly for s, e and n. R 2 is similar to R 1 . The elements 
with an underscore correspond to the fine grid point 2k when restriction results in a 
function value in the coarse grid point k. The prolongation operators P 1 , P 2 and P 3 
employ bilinear interpolation. The adjoints P 1 * and P 3 * of P 1 and P 3 are given by: 




nw 

2 n 

ne 






(4 — n)w 
(4 — s)w 

2(4 - n) 
2(4 - s) 

(4 — n)e 
(4 — s)e 

? 



(14) 

sw 

2s 

se 






nw 

n(4 

— w) 



n(4 — e) 

ne 


(4 — n)w 

16 — 4(n + it?) -f nw 

16 - 

- 4(n + e) + ne 

(4 — n)e 


(4 — s)w 

16 — 4(s + w) + sw 

16- 

- 4(s + e) + se 

(4 — s)e 

? 

sw 

s( 4 

— w) 



s(4 — e) 

se 



and P 2 * is similar to P 1 *. For a more detailed exposition of these transfer operators, 
see [13] and [16]. 


2.2.2 The smoothing operators 

In this subsection we describe the smoothers which are used in the multigrid method: 
Jacobi smoothing and ILU smoothing. The reason for this choice is that Jacobi 
smoothing has good vectorization (parallellization) properties but is not robust, whereas 
the ILU smoothing is robust but not easily vectorized. 

Method 2: Multigrid with Jacobi smoothing 

Our Jacobi smoothing method consists of one horizontal Jacobi line iteration followed 
by one vertical Jacobi line iteration. The momentum equations are smoothed in a 
decoupled way, i.e., the two momentum equations are smoothed successively. In a 
horizontal smoothing iteration, mutually independent tridiagonal systems have to be 
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solved: MjSxj = rj for a horizontal line j. The three non-zero elements at row i in 
Mj are denoted by by, d,-j, and u,j. The matrix Mj is factorised into: 

Mj = (Lj + Dj)D~ l (Dj + Uj) (15) 

where Lj and Uj have only one non-zero diagonal below and above the main diagonal, 
equal to l it j and u, tJ and Dj is a diagonal matrix. Comparable formulae are used in 
a vertical smoothing iteration. Variables are updated after each horizontal and after 
each vertical step with a fixed underrelaxation factor w = 0.7. 

Method 3: Multigrid with ILU smoothing 

Suppose that the equation to be smoothed is denoted by 

Ax = b. (16) 

A smoothing iteration is given by 

Sx = M -1 (b — Ax),x := x + u>6x (17) 

with u — 0.8 fixed. For the ILU smoothing we choose M = (L + D)D -1 (D + U), 
where L and U are strictly lower and upper triangular matrices, and D a diagonal 
matrix. Matrices L and U have non-zero entries in the positions corresponding to 
the standard 9-point stencil pattern and are chosen such that the elements of M 
belonging to the 9-point pattern are equal to the corresponding elements of A. The 
momentum equations are smoothed in the same decoupled manner as in Method 2. 
Again, factorization takes place only at the beginning of multigrid iterations for a 
time step, and L, D and U are kept until the next time step. 

2.2.3 The combined methods 

The methods presented below are very flexible. In many other combinations of Krylov 
subspace and multigrid methods, the inner loop procedure must be the same for ev- 
ery outer loop iteration. In these methods this is not necessary, so in different outer 
iterations one may use different inner loops, for instance a mix of GMRES and multi- 
grid, or a different number of iterations with multigrid or multigrid with different 
smoothers, etc. The methods are based on the GMRESR idea where we use a GCR 
outer loop and a GMRES inner loop. The algorithms for the new methods are given 
below and only differ in the construction of the new search directions. 

Method 4: GCR with Method 2 as inner loop 

This method is obtained by replacing GMRES (m) in the inner loop of Method 1, by 
Method 2. 

Method 5: GCR with Method 3 as inner loop 

This method is obtained by replacing GMRES(m) in the inner loop of Method 1, by 
Method 3. 
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3 Numerical Experiments 


3.1 Test Problems 


We consider four test problems: an oblique driven cavity problem, an L-shaped driven 
cavity problem, a backward facing step problem [2], and a 90° bend problem [11]. The 
grids used for these problems are shown in Figure 1. We study these problems for 
various time steps and grid sizes. Furthermore for every problem two values of the 
Reynolds number are used. For the driven cavity problems we take Rei ow = 1 and 
Rehigh = 1000, in the backward facing step problem Re\ ow = 50 and Rehigh = 150, 
whereas in the bend problem Rei ow = 500 and Rehigh = 1000. The number of time 
steps is fixed at 40. . This number is a rather arbitrary choice, because our purpose 
here is not to solve problems until steady state, but to investigate the performance 
(efficiency and robustness) of solution methods. Based on numerical experiments, the 
following stop criterion is chosen: the iterative solution of the systems at each time 
step is terminated if the ratio of the norm ||r|| of the residual to the norm ||ro|| of 
the residual at the beginning of the present time step satisfies 1 1 r 1 1 / 1 1 r 0 1 1 < to/, with 
tol — 10 -4 for the momentum equations and tol = 10~ 6 for the pressure equation. In 
Subsection 3.2 experiments on a scalar computer are described whereas Subsection 
3.3 contains the results on a vector computer. 

3.2 Experiments on a scalar computer 

In this subsection we present numerical experiments on an HP 735 computer. We have 
run all methods described in Section 2 for the test problems given in Subsection 3.1. 
For brevity, here we only present a representative subset of the results. In Subsection 
3.2.1 the momentum equations are considered, whereas in Subsection 3.2.2 we show 
results for the pressure equation. 

3.2.1 The momentum equations 

The properties of the linear systems originating from the discretized momentum equa- 
tions that influence the iterative solvers depend on: the size of the time step, the 
Reynolds number, the grid size, and the shape of the space domain. Below, the in- 
fluence of these parameters is considered in more detail. In the first part we restrict 
ourselves to the oblique driven cavity problem, only in the final part results are given 
for all test problems. The reason for this is that the results for the other problems 
are comparable with those of the oblique driven cavity problem. 

Dependence on At, the Reynolds number and the grid size 
In Table 1 we give some measurements concerning Method 1 and Method 3 applied 
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a. 



b. 



c. d. 

Figure 1: Grids for the four test problems: a. The oblique driven cavity problem 
(32 x 32); b. The L-shaped driven cavity problem (32 x 32); c. The backward facing 
step problem (48 x 16); d. The 90° bend problem (16 x 64). 
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Re = 1 


Re 

= 1000 


Grid 

At 

U 



pvi Pp 

** 


jj^Jj 


Pvi Pp 

Method 1 

32 

.0625 

19 

7, 7 

5, 9 


13 

1, 7 

1, 

8 


X 

.125 

19 

7, 7 

5, 8 


14 

2, 7 

2, 

8 


32 

.25 

19 

8, 7 

5, 8 


15 

3, 7 

3, 

8 


64 

.0625 

151 

74, 57 

8,13 


90 

14, 56 

2, 

12 


X 

.125 

158 

81, 57 

9,12 


93 

18, 55 

3, 

11 


64 

.25 

162 

86, 57 

10,13 


104 

28, 56 

4, 

12 


128 

.0625 

1501 

774,642 

14,22 


830 

97,648 

2, 

22 


X 

.125 

1617 

879,653 

18,22 


870 

132,652 

4, 

22 


128 

.25 

1655 

917,653 

20,23 


951 

213,653 

6, 

23 



Method 3 

32 

.0625 

74 

26, 35 

4,14 

.246, .371 

63 

20, 30 

3, 

12 

.0871,.366 

X 

.125 

74 

27, 35 

4,14 

.240,.373 

66 

24, 30 

4, 

11 

.142 ,.313 

32 

.25 

74 

27, 34 

4,14 

.226,.372 

72 

29, 30 

6, 

11 

.250 ,.346 

64 

.0625 

257 


4,14 

.229, .370 

224 

76,110 

3, 

12 

.0933, .351 

X 

.125 

255 


4,14 

.215, .371 

240 

91,111 

4, 

11 

.138 ,.366 

64 

.25 

252 

97,117 

4,13 

.200, .370 


93,109 

4, 

11 

.214 ,.357 

128 

.0625 

1073 

424,499 

4,13 

.203, .370 


395,470 

4, 

10 

.163 ,.354 

X 

.125 

1058 

425,484 

4,13 

.194,.370 

1056 


4, 

11 

.179 ,.338 

128 

.25 

1045 

425,470 

4,12 

.190,.368 

1099 


4, 

12 

.191 ,.358 


Table 1: The oblique driven cavity problem on the HP: the total CPU time t t , the 
CPU times t v and t p for the solution of the momentum equations and the pressure 
equation, respectively, the numbers of iterations k v and k p in the final time step, and 
the reduction factors p v and ,p p of the multigrid algorithm in the last iteration in the 
final time step. 


to the oblique driven cavity problem. The behaviour of the other methods is compa- 
rable to Method 3. We observe that the number of iterations of Method 3 is more or 
less independent to the various choices of A t, Re, or the grid size. 

Now, we consider the dependence of Method 1 (GMRESR) for the various choices. 
The main diagonal of the momentum matrix is enhanced by a contribution 1/A t due 
to the time derivative. So for small At the matrix is diagonal dominant. It appears 
from Table 1 that the number of iterations of the GMRESR method grows, if At 
increases. Comparing the results for the two Reynolds numbers, it appears that GM- 
RESR converges much faster for Re = 1000 than for Re = 1. Finally, as expected, 
the number of GMRESR iterations increases for increasing grid size. 
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Obique driven cavity, Re = 1 


L-shaped driven cavity. Re = t 




Figure 2: CPU times per grid point on the HP for the momentum equation during 
40 time steps, for Rei ow and At = 0.0625. 


Problem dependence and comparison 

For a comparison of the various methods on the four test problems we plot the CPU 
time on an HP 735 per grid point for 40 time steps against the grid size. In these 
figures we use the following symbols: 


Method 1: 
Method 2: 
Method 3: 
Method 4: 
Method 5: 


solid lines and point marks, 
dotted lines and circles, 
dashed lines and stars, 
dotted lines and plus marks, 
dashed lines and x-marks. 


Where no symbols are shown they are off-scale. For Rei ow the results are given in 
Figure 2 and for Rehigh the results are given in Figure 3. 

First we discuss the combination of GCR and multigrid. From Figures 2 and 3 it 
appears that the GCR acceleration of the Jacobi smoothed multigrid is better than 
multigrid itself. If the smoother is sufficiently powerful, as for instance for Method 3 , 
where we use an ILU smoother, then the combination of GCR and multigrid gives a 
slightly worse performance. In these cases, the number of iterations is the same but 
the CPU time increases somewhat due to the GCR overhead. 
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Figure 3: CPU times per grid point on the HP for the momentum equations during 
40 times steps for Rehigh and At = 0.0625. 

Secondly, we compare Method 1 with the best multigrid method: Method 3. It ap- 
pears that for Method 3 the CPU time per grid point is independent of the grid size 
and the Reynolds number. For Method 1 there is more variation: the CPU time 
increases for a larger grid size and a smaller Reynolds number. For a large Reynolds 
number Method 1 is much faster than Method 3. For the driven cavity problems and 
a small Reynolds number, Method 1 is more efficient for medium grid sizes, whereas 
Method 3 is the best method for large grid sizes. For the oblique driven cavity prob- 
lem the break-even point is in the range [64, 128] and for the L-shaped driven cavity 
problem the break-even point is in the range [128, 256]. 

Finally we discuss robustness. Methods 1, 3 and 5 are equally robust. For most prob- 
lems they work well. Only for the 90° bend problem there are some failure cases (not 
shown here) when At is large and Re large. The least robust method is Method 2; 
it suffers from convergence problems when either the grid is refined or At is large for 
some problems. But when it is combined with GCR, resulting in Method 4, robust- 
ness is improved very much. Sometimes when Method 2 fails to work, Method 4 still 
works rather satisfactorily. However, Method 4 falls behind Methods 1, 3 and 5 for 
Re large. 
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Figure 4: CPU times per grid point on the HP for the pressure equation during 40 
time steps. 


3.2.2 The pressure equation 


The properties of the discretized pressure equation depends only on: the grid size 
and the shape of the space domain. 

Grid size dependence 

The multigrid and combined methods require the same number of iterations for in- 
creasing grid size. Again Method 1 depends on the grid size; the number of iterations 
grows for increasing grid size. This is illustrated by Table 1 where the results for the 
oblique driven cavity problem are given. 

Problem dependence and comparison 

The CPU time on an HP 735 per grid point for 40 time steps is shown in Figure 4. It 
appears that for both smoothers the combination of GCR and multigrid is more ef- 
ficient then multigrid itself. Especially in the oblique driven cavity problem, Method 
4 is two times as fast as Method 2. Also for the strong ILU smoother the CPU time 
for Method 5 is considerably less than for Method 3. 

Finally, we compare Method 1 with the best multigrid method: Method 5. It appears 
that Method 1 is more efficient for medium grid sizes, whereas Method 5 is more 
efficient for large grid sizes. For the driven cavity problems the break-even point is in 
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range [64, 128] whereas for the other problems the break-even point is in the range 
[32, 64]. For the pressure equation Method 1 has a super linear convergence behaviour 
[12], which means the reduction of residuals is faster in later iterations than in the first 
ones. Since the multigrid and combined method are linear convergent, this implies 
that decreasing the termination criterion tol would benefit Method 1 and vice-versa. 


3.3 Experiments on a vector machine 


In this subsection we report on some experiments on a Convex C3840. First, we com- 
pare Methods 1, 3 and 5, because they are the best methods on the scalar machine 
and have different vectorization properties. Thereafter, Methods 3 and 5 are com- 
pared with Methods 2 and 4 to analyse the performance of methods using a weaker 
smoother but with greater vectorization potential and using a stronger smoother but 
with smaller vectorization capability. 


Comparing the best methods 




Figure 5: CPU times per grid point on the Convex during 40 time steps for the L- 
shaped driven cavity problem, with Re = 1 and At = 0.0625. Left: the momentum 
equations, right: the pressure equation. 

In Figurp 5 we present the CPU time per grid point against grid size for the L-shaped 
driven cavity problem. To show the effect of an increasing vector length, computa- 
tions on a 256 x 256 grid are included. From this figure it appears that the convergence 
behaviour of the methods is comparable to that on a scalar machine: the efficiency of 
Method 1 deteriorates and that of Methods 3 and 5 improves with grid refinement. 
Due to the good vectorization properties of the Krylov methods the break-even point 
moves to finer grids and the GCR overhead for the combined methods becomes neg- 
ligible. Finally, the curves for Methods 3 and 5 become flatter when going to finer 
grids, which indicates that the efficiency gain from a larger vector length is gradually 
exhausted. 
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Comparing the vectorization properties of the smoothers 
It appears that the higher Mflop rate of Methods 2 and 4 does not compensate the 
slower rate of convergence, although on the vector machine they compete better than 
on the scalar machine. This is true for all test problems and is illustrated with the 
momentum equations of the L-shaped driven cavity problem in Table 2. Note that 
for a low Reynolds number Methods 2 to 5 are comparable, but for a high Reynolds 
number Methods 3 and 5 are superior to Methods 2 and 4. Method 2 does not work 
on finer grids and even fails on the 256 x 256 grid. 


grid size 


Re 

= 1 



Re = 

1000 


32 

64 

128 

256 

32 

64 

128 

256 

Method 2 

0.039 

0.022 

0.013 

0.011 

0.028 

0.023 

0.045 

oo 

Method 3 

0.040 

0.021 

0.012 

0.010 

0.032 

0.017 

0.011 

0.010 

Method 4 

0.040 

0.020 

0.012 

0.010 

0.032 

0.020 

0.017 

0.017 

Method 5 

0.043 

0.022 

0.013 

0.010 

0.033 

0.018 

0.012 

0.009 


Table 2: CPU time per grid point on the Convex during 40 time steps for the mo- 
mentum equations for the L-shaped driven cavity problem. 


4 Conclusions 

We have investigated numerically five iterative methods, namely, Method 1: GM- 
RESR: GCR with GMRES as inner loop, Method 2: multigrid with a Jacobi line 
smoothing, Method 3: multigrid with an ILU smoothing, Method 4: GCR with 
multigrid with Jacobi line smoothing as inner loop and Method 5: GCR with multi- 
grid with ILU smoothing as inner loop, in the context of application to the solution of 
the incompressible Navier-Stokes equations in general coordinates on staggered grids, 
using the pressure correction method in the time-dependent case. 

From our numerical experiments we draw the following conclusions: 

- For the solution of the momentum equations with a high Reynolds number 
Method 1 is the best method. 

- For solving the momentum equations with a low Reynolds number Method 1 is 
faster for medium sized grids, whereas Method 3 is the best method for large 
sized grids. 

- For the pressure equation Method 1 is also optimal for medium grid sizes. For 
large grid sizes Method 5 is the most robust and efficient method. 

- The GCR outer loop of Methods 4 and 5 speeds up the rate of convergence, 
especially for weak smoothers (Method 4). 
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Finally, we remark that the break-even point, where the efficiency of the Krylov sub- 
space method is equal to that of the multigrid method, depends on many factors. 
Some of them are: the domain of the test problem, the termination criterion, the 
Reynolds number, the computer used (scalar, vector, or parallel), etc. In Section 3 
we have investigated numerically in which direction the break-even point moves de- 
pending on a change of one of these factors. 
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AN ALGEBRAIC MULTIGRID SOLVER FOR NAVIER-STOKES 
PROBLEMS IN THE DISCRETE SECOND-ORDER 
APPROXIMATION 


R Webster 

Roadside, Harpsdale, Halkirk, Caithness, KW12 6UL, Scotland, UK 


ABSTRACT 

An algebraic multigrid scheme is presented for solving the discrete Navier-Stokes 
equations to second-order accuracy using the defect-correction method. Solutions have 
been obtained for problems involving both structured and unstructured meshes, with the 
resolution and resolution grading controlled by global and local mesh refinements. 

The solver is efficient and robust to the extent that no underrelaxation of variables has 
been required to ensure convergence, but rates of convergence can be improved with small 
amounts of underrelaxation of the velocity-pressure coupling. Provided that the 
computational mesh can resolve the flow field, convergence characteristics are almost mesh 
independent. Rates of convergence actually improve with refinement, asymptotically 
approaching mesh independent values. For extremely coarse meshes where dispersive 
truncation errors would be expected to prevent convergence (or even induce divergence), 
solutions can still be obtained by using explicit underrelaxation in the iterative cycle. 


INTRODUCTION 


Solution of the equations of motion for viscous fluids in the discrete approximation 
demands powerful computing resources. This is because the flow fields of practical interest 
are invariably complex and require a high degree of spatial resolution. Resolution of length 
scales that span many orders of magnitude may be necessary even for stable lamina flows. 
If Q is some measure of the linear resolving power of a discretisation (such as an 
appropriately scaled inverse of the nodal separation), then the number of discrete equations 
to be solved, N, will scale as 


N~Qd (1) 

where d is the number of spatial dimensions. Since, moreover, the computational work will 
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scale as NP, where (3 depends on the solution method ((3 > 1.0), the required computing 
time, T, will scale as 


T ~ QP d (2) 

Clearly T can be a very 'strong function of the required resolution. For example, for 3D 
finite-element problems that require direct solution methods (such as Gaussian elimination), 
the exponent can be as large as 9 (i.e., (3 = 3, d = 3). Since in fluid dynamics we are looking 
for orders-of-magnitude improvements in resolution it is essential to develop efficient 
solvers with optimum scaling (|3 = 1.0). It is also important that this scaling hold good for 
non-uniform, unstructured meshes so that the nodal economy can be maximised by 
matching the density of nodes to the required resolution, which may be both anisotropic and 
inhomogeneous. 

In a previous paper [1], a new iterative solver was presented for the discrete Navier- 
Stokes equations in the first-order approximation which addressed these requirements. The 
method was based on a fully implicit Algebraic Multigrid (AMG) scheme. This paper 
describes changes to the scheme which can virtually eliminate the need for underrelaxation 
in the iterative cycle. Performance data have been obtained for a number of problems on 
both structured and unstructured computational meshes. Here results for the sudden- 
expansion test problem are presented for second-order accuracy using the defect correction 
method. 


THE DISCRETE APPROXIMATIONS 

The discrete equation sets for the flow variables are derived from a finite-volume 
discretisation of a finite-element mesh by enforcing the conservation of mass and 
momentum for an incompressible fluid. The simplest possible linear element is used : the 
triangle (in 2D), which is capable of giving second-order accurate equations. Control 
volumes are constructed around each vertex node by joining the centroid of each element to 
the centre of each side (Figure 1). Within any given element, just one flux value is used for 
the control surfaces so formed, and this is obtained by a special interpolation. The centroid 
provides the single interpolation point. A second discretisation within the element is used to 
derive the interpolation equation. Figure 2 shows three examples of the subcontrol volumes 
that have been used; the smallest is the one chosen for this work. The scheme is similar to 
those proposed by Prakash[2], Hookey[3], and Schneider and Raw[4]. 

If v represents the set of nodal velocities, v e the set of interpolated velocities within 
elements, and p the set of nodal pressures, then enforcing the conservation laws for both 
nodal control volumes and element sub-control volumes delivers the following set of 
algebraic equations: 


A(v e ) v + G p = s 


(3) 
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Figure 1: Illustrating the linear triangular element, element assembly, and the construction 
of the control-volume tesselation; one control volume is highlighted. 


A e (v e ) v e + F(v e ) v + G e p = s e 

(4) 

Dv e =0 

(5) 


where A and G are the nodal advection-diffusion and gradient operators respectively; A e 
and F are each part of the advection-diffusion operator for elements; G e is the element 
gradient operator; D is the nodal divergence operator; and s and s e represent the momentum 
source/sink arrays for the nodal control volumes and for the element sub-control volumes, 
respectively. 

The matrix A e is diagonal, so the solution of equation (4) is trivial; that is, 

v e = A e - 1 (s e -Fv-G e p) (6) 

Direct substitution into equation (5) enables the following subset of coupled equations to 
be formed for the nodal variables: 


l 

> 

2 

o 

1 

V 


s 

(DA e -iF) (DA e -iGg) 

P 


(DAg^Sg) 


(7) 


The solution of equations (6) and (7) is obtained by direct iteration using a predictor- 
corrector strategy for v e and [v p]; the AMG solver providing the coupled solution of 
equation (7) for [v p ]. 

If upstream values are used in the enforcement of momentum conservation for nodal 
control volumes, then equation (7) will be first-order accurate. For this work, a second- 
order approximation is also required. The simplest possible second-order approximation 
was adopted using equal proportions of upstream and downstream values for the advected 
momentum across the control surfaces, equivalent to the central differencing of finite- 
difference methods. 


Figure 2. Interpolation for element velocities, v e : three subcontrol volumes that have been 
used for a local discrete solution of the equation of motion. 

THE ITERATIVE SOLUTION METHOD 
By writing equations (6) and (7) in the more concise form as 

v e = A e -i(s e -Hcp) (8) 

L(v e )<p = f (9) 



A(v e ) G 


V 


s 

where L(v e ) = 

(DA e -iF) (DA e iG e ) 

. 9 = 

P 

, H = [ F G e ], f = 

(DA e -is e ) 


and by writing the first and second order approximations of L(v e ) and f as Lj, L 2 and f l5 f 2 
respectively, the following iterative procedure can be constructed [5] starting with v° e = 0 
and cp° = 0: 

v e n = Ae' 1 ( s e - H cp n ) n > 0 

Lj (v e n ) cp n+1 = fj n n < m ( 10 ) 

Li(v e n ) tp n+1 = f 2 n + [ L^Vg") - L 2 (v e n ) ] (p n n > m 

where m marks a suitable point in the iteration sequence for switching on the defect 
correction, [ (L^Vg 11 ) - fj n ) - (L 2 (v e n ) - f 2 n ) ]q> n . At convergence cp n+1 = tp n = cp, and the 
second-order equation 


L 2 (v e )cp = f 2 n (11) 

will be satisfied within the permitted tolerance. The convergence should, moreover, proceed 
at a rate determined more by the properties of Lj than those of L 2 . 


The equation system 


L 1 (v e I1 )<p n+1 = P 1 


( 12 ) 


where f“ is now understood to include the defect correction if n > m, may be represented 
graphically as a connected nodal network with a one-to-one correspondence between 



variables (equations) and nodes; the connections between nodes represent the coupling 
between equations. For like variables, there will also be a one-to-one correspondence 
between connections and the edges of elements in the computational mesh. For unlike 
variables, connections may be regarded as displacements in an abstract dimension. To 
distinguish the nodal network from the computational mesh, it will be referred to as the 

“ algebraic grid ” or simply the grid. 

In an iterative solution procedure based on point relaxation, each node of the grid is 
visited in turn and that variable is updated/corrected entirely on the basis of local 
information (i.e., from those neighbours to which the node has direct connections). Because 
of this, a single sweep through the grid system will only see changes propagating short 
distances (i.e., of order one nodal spacing). Long range propagation is a diffusion-like 
process that requires many iterative sweeps. If is a relevant propagation distance 
expressed in units of nodal spacing, then the number of iterations required, n, will scale as 

n ~ A.j 2 = (Q/Qj) 2 (13) 

where Q is the maximum resolving power; Qj is the minimum resolving power required for 
the resolution of A.j. Since the computational cost of one iteration will scale as N, the total 
number of nodes to be visited, the required computing time will scale as 

T ~ NQ 2 = Q?+ 2 . (14) 

Thus, from the grid system equivalent of equation (2) 

P = 1 + 2/d. (15) 

Clearly, solvers based on point/local relaxation can scale poorly, with P = 2 or (5 = 5/3 
for 2D and 3D problems, respectively. To achieve optimum P = 1 scaling it is necessary to 
have an efficient propagation of corrections over all length scales simultaneously. This 
requires multigrid methods. 

AMG methods [6,7] exploit a hierarchy of reduced equation sets (coarse grids) derived 
from and including the base set (fine grid). Ideally, coarse grid generation proceeds 
recursively such that each successive grid is a consistent representation of the problem at a 
reduced scale of resolution, Qj, associated length scale A.;. Just one sweep of a relaxation 
procedure at this level will be sufficient to propagate changes over A,j (i.e., Q = Qj ); hence, 
from equation (13), n = 1. With a sufficient number of grids spanning the complete range of 
length scales relevant to the problem, an efficient propagation over all length scales can 
take place simultaneously within one relaxation sweep. Thus, considering the first level of 
coarsening, if K is a suitably chosen restriction operator, it may be applied to the base 
set (12) to form the reduced system 


Lj c cp c = r c 


( 16 ) 


759 



where = (K Lj K T ). If r c is derived on the basis of the residual r = f - Ljcp: 


r c = Kr = K(f-L 1 tp) 


( 17 ) 


then a solution of equation (16) provides a correction tp c that can be used to improve cp : 

cp -» tp + K T (p e (18) 

The procedure is as follows: restrict residual errors to the coarse grid using equation (17) ; 
reduce the coarse-grid (long-range) errors by applying local relaxation methods to equation 
set (16); prolongate the coarse-grid correction and update the fine grid solution using (18) ; 
and reduce the fine-grid (short-range) errors by applying local relaxation to equation 
set (12). Clearly equation (16) has the same form as equation (12) so the procedure can be 
applied recursively to generate smaller equation sets for successively coarser scale 
corrections. In this way a “ multiscale ” correction, K T (p c , can be assembled for updating cp. 

A coarsening procedure based on that devised by Lonsdale [8] for scalar field variables 
has been used to generate the reduced equation sets. This consists of seeking out the 
equations with the strongest coupling (the largest off-diagonals in the L matrices) and 
joining them together by adding the corresponding matrix coefficients. Some care is 
required in implementing the procedure [8,1]. The elementary matrix representation of 
Lonsdale's restriction operator K (dimension N; x Nj, N ; < Nj < N), if required, can be 
formed by simply adding the appropriate rows of the Nj x Nj unit matrix. The reduction 
factors ( Nj / Nj ) may be freely chosen, though values of about 0.5 are usually used. 

Since here the equation system is for coupled vector and scalar fields, the procedure is 
implemented in a way which preserves the block structure of the L matrix operator. 
Combining equations for different field variable types is thus forbidden; coarsening is only 
permitted in “ real space ", equivalent to choosing a block-diagonal K matrix. Note that this 
does not prevent different coarsening for different field variables. 

The process can be terminated when no further reduction in the number of equations is 
possible, and the matrix dimension is then equal to the number of continuum flow variables. 
In [1] and in this work, however, the process is actually terminated earlier at between about 
30 and 60 equations. 

The elementary K-matrix restriction combines equations in equal proportion. However, a 
better coarse grid approximation can be achieved if fine grid equations are combined in 
proportions that respect their relative importance at the coarser level of resolution. 
Therefore provision is made here for a more general, weighted restriction. For AMG 
solvers, this is particularly important both for uniform and non-uniform discretisations alike 
because, even if an initial fine grid is a regular array of identical nodes, the algebraic 
coarsening process is unlikely to preserve such uniformity. Thus, if R and P are the actual 
restriction and prolongation operators to be used, then fine grid and coarse grid weighting 
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operators, W and W c , are introduced such that 


subject to the scaling rule 


R = [W C ] _1 KW 


(19) 


R I P = I c 


( 20 ) 


where the unit operator, I , for the fine grid transforms under the action of R and P into the 
unit operator, I c , for the coarse grid. Combining these equations gives 


W c = K W P. (21) 

For computational expediency P has been chosen to be simply K T in this work so that the 
coarse grid weighting operator is simply the fine grid operator transformed using 
elementary restriction and prolongation. 

For a finite-volume discretisation, a natural choice for W is the diagonal operator formed 
from the set of nodal control volumes. Equation (21) can then be simply interpreted as 
control-volume agglomeration and the restriction procedure R defined by equation (19) as 

1. Conversion of the fine grid equations into the naturally additive net flux form (W). 

2. Formation of the coarse grid equations (K,K T ). 

3. A conversion of the coarse grid equations back to the normal form ([W 0 ]" 1 ). 

The coarse grid approximation so produced results in a robust and an efficient solution 
algorithm. 

Following the R-restriction of residual errors down through the grid hierarchy, with Vj 
relaxation sweeps at each level, the multiscale correction is assembled by the reverse 
procedure of the upward P-prolongation of solutions (possibly scaled by a), this time 
applying v 2 relaxation sweeps following each prolongation. This is the well known V-cycle 
schedule, V(v 1 ,v 2 ). In this work, however, the full multi-grid cycle F(vj,v 2 ) has been 
adopted in which the upward leg of each cycle itself contains nested V-cycles (Figure 3). 
Furthermore, because the coarsest grid only contains between 30 and 60 nodes, a direct 
solver is used to obtain an accurate solution. 

Two relaxation schemes have been adopted, both based on point Gauss-Seidel (PGS) 
relaxation. For the intermediate coarse grids, PGS with optimum damping is used. If 
Li c = L + D + U is the standard splitting for Gauss-Seidel relaxation (L is the lower 
triangular block, U is the upper triangular block, and D the diagonal of L^), then the 
algorithm for v relaxation sweeps is 
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■ Direct solver 


Figure 3 . F-cycle strategy for transferring residuals and corrections. 


d' = (L+D ) _1 ( r'" 1 - U d i_1 ) 
z' = L^d' 

ai = <(zi) T , ri-i>/<(zi)T z i) (22) 

Cpc(i) = (p c (i'1 )+ o' d' 
r' = r i_1 - cr' z'. 

Before prolongation, the coarse grid corrections cp c are also scaled by the factor 

a = r c ) / ((L 1 c q> c ) T , (24) 

For the fine grid, an approximate 4-direction, point Gauss-Seidel algorithm for 
unstructured meshes is used (4-PGS). This involves some preprocessing for the formation 
of 4 continuous line orderings of nodes such that each node is visited once only within each 
line, and lines attempt, wherever possible, to pass through each node from different 
directions. 

The residual reduction factor, or fractional error reduction for each F-cycle, p, depends 
on the efficiency of the local relaxation process (smoothing) and on the quality of the coarse 
grid approximation [6,7,9]. Empirical p factors are defined and results presented for several 
test problems. 

Although Lj does not have to be positive definite, it must have block diagonal matrices 
that are suitable for solution by scalar AMG methods [6] ; diagonal blocks must be at least 
positive semi-definite. The first-order discretisation based on the advection of upstream 
momentum) produces block diagonal matrices for the velocity-component equations that 
should satisfy that requirement. The block diagonal matrix for the pressure equations is 
positive semi-definite in any case. 

Boundary conditions are implicitly contained in Lj. At least one pressure node is 
implicitly fixed in all calculations. No special measures are necessary for dealing with 
boundary conditions at the lower levels of the grid system. The necessary information is 
automatically transferred by the restriction operator. 
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Implicit underrelaxation of both velocity and pressure is commonly used to ensure 
convergence of Navier-Stokes linear solvers. For this coupled AMG linear solver, 
underrelaxation has not been necessary. Provided that the above described, weighting in the 
restriction procedure is employed, no underrelaxation has been required for any problems 
tackled so far. However, a small amount of underrelaxation can improve the rates of 
convergence for both inner and outer iterations. It can be implemented without prejudicing 
the long-range spatial coupling as follows. All entries in the off-diagonal blocks of Lj are 
reduced by a factor to and/or all entries in the diagonal blocks are increased by 1/co, with 
appropriate compensations of the right hand sides of the equation sets, evaluated using 
previous iterates cp n . Optimum convergence rates occur for co values in the range 
1.0 > co > 0.9. 

Note that it is also possible to relax the coupling between like variables by increasing 
just the diagonal entries of the relevant diagonal block and making the appropriate right- 
hand side compensations. This is not recommended. It loosens the spatial coupling that 
AMG is supposed to be dealing with, which results in a degradation of convergence 
performance (including the scaling ). 

PERFORMANCE 

The solver has been applied to a number of well established test problems. Here flow in 
a channel with a sudden asymmetric expansion is presented. This problem incorporates 
several features of complex fluid behaviour that can present difficulties for solvers, 
particularly at high Reynolds numbers (e.g., singularities, recirculation, boundary layers, 
entering flows, outlet flows). Some of these features have been isolated for special 
investigation by those involved in the development of multigrid methods. 

Of interest are the quality of the second-order solutions, the rates of convergence and, in 
particular,- the mesh dependence of both of these aspects of performance. To assist in the 
presentation and analysis of results it will be useful to introduce mesh resolution and 
grading factors and to define the convergence factors. 

Mesh Resolution and Grading Factors 

The inverse nodal separation (linear resolution) and its variation with direction and 
position (grading) is used to characterize the meshes. The global extremes of the resolution 
and grading will be sufficient for most purposes. Thus, reference is made to the maximum 
linear resolving power Q, the maximum global grading factor T, and the maximum local 
grading factor y. Q is defined as the ratio of the largest characteristic length scale divided 
by the closest nodal spacing. T is defined as the ratio of the maximum to minimum nodal 
separations for elements in the mesh regardless of their position. The local grading factor 
for any node in the mesh is the ratio of the largest to the smallest separation of the node 
from its immediate neighbours (i.e., for elements common to the node). Directional aspects 
are thus largely ignored except where reference is made to longitudinal and transverse 


763 



resolution and grading factors Q x , T x , y xx , Q y , T y , and y yy , respectively. Aspect ratio y xy will 
also be referred to. In this case the nodal separations in any chosen element are both 
selected and weighted according to their degree of alignment with the relevant direction. 

Convergence factors 

Convergence characteristics will be quantified in terms of the convergence factor p n , 
where 


p n = II 8(p n 11^ / II 8(p n_ hl 00 (24) 

where 8cp n is the multiscale correction for the iteration index n. Thus, the larger the rate of 
convergence, the smaller the convergence factor. The average convergence factor p for a 
sequence of N, Navier-Stokes (i.e., outer) iterations is 

P = { II StpN \\„ / II 8cp° II* }i/N = {IV P n } 1/N (25) 

The residual reduction factors, p and p ; , for inner iterations are defined similarly but in 
terms of the Euclidian norm of the residual errors, that is 

pi = II ri ll 2 / II ri- 1 ll 2 (26) 

where in this case r* is the residual following the F-cycle, index i. 

Various F-cycle schedules have been tried from F(1,0) to F(8,2). On the fine grid, v 2 = 1 
actually corresponds to one application of the 4-PGS smoother. 

In practice, the important convergence parameter is the fractional reduction of error per 
unit of computing time which may not be quite the same as the reduction of error per 
iteration as defined in equation (26). However, with a fixed number, v, of F-cycles per 
iteration the computing time per iteration will be more or less constant; then as long as 
p v « p, p will be equivalent to the convergence rate in time for all practical purposes. The 
number of F-cycles does not have to be large to satisfy this requirement. Also, there is little 
if anything to be gained by insisting that p v be extremely small, since much of the work 
done will be immediately undone when the non-linear terms are updated in the outer 
iteration. 


ASYMMETRIC SUDDEN EXPANSION TEST PROBLEM 

To test the solver on a problem with inflow and outflow boundary conditions, it has been 
applied to the asymmetric, sudden-expansion problem. This is a high aspect ratio problem, 
so it offers a convenient test for the performance of the solver on meshes with highly 
elongated elements. 
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Flow enters a two-dimensional channel with a parabolic inlet velocity profile. Some 
distance from the inlet there is a one sided step increase in channel width to 3/2 the 
original. How separates at the re-entrant comer and a re-circulation zone is established 
after the step. The axial extent of the circulation is marked by the point of re-attachment, or 
the point at which uni-directional flow is re-established across the entire width of the 
channel. This depends on the Reynolds number. Re. Results have been published for 
Reynolds numbers up to and, in some cases, exceeding Re = 250. Re is based on step 
height and mean inlet velocity (note that this definition gives values 6 times smaller than 
those based on hydraulic diameter and maximum inlet velocity.) 

A significant length of the expanded channel (exceeding 3 hydraulic diameters) needs to 
be modelled to ensure that the imposed outlet boundary condition does not unduly 
influence the behaviour upstream. Thus, the problem is bound to be one of large aspect 
ratio ( ~10 ) and, in view of the need for fine resolution near the point of separation, the 
discretisation could prove to be nodally expensive if uniform meshes are used. Thus, only 
non-uniform meshes have been adopted for this investigation and results for just one 
unstructured mesh type have been selected for presentation. 

The prototype triangulation is illustrated in Figure 4. It consists of 8 1 proto-elements 
which have been assembled to give the highest resolution at the point of separation and so 
that the lateral resolving power Q y is maintained moderately high up to the point of re- 
attachment. The actual meshes used were obtained by a q-fold nested refinement of each 
proto-element into as many as q 2 = 64 congruent triangles, giving a finest mesh of 
5184 elements (2717 nodes). The mesh is anisotropic and inhomogeneous with grading 
factors y xx = 4, Yyy = 4, y xy = 5.3, T x = 32, T y = 4. Dirichlet boundary conditions for velocity 
and free pressure boundary conditions apply on all surfaces except the outlet. The latter 
(continuitive and constant pressure) was placed 38 step lengths from the expansion. 



consisting of 81 proto-elements. Q x = 5T x q ; Q y = 3T y q ; where q = level of nested 
refinement. T x = 32; T y = 4; Yxx = 4; Yyy = 4; Y xy = 5.3. 


The reduction factors for this test problem were wit hi n the expected range for point 
Gauss-Seidel relaxation. Table 1 gives the average values for a low Reynolds number. Both 
definitions of Reynolds number are used (i.e., the first, Re, is based on step height and 
average inlet velocity, and the second, Re h , is based on hydraulic diameter and maximum 
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inlet velocity). 


N 

1236 

2133 

3273 

4656 

8151 

q 

3 

4 

5 

6 

8 

p[F(l,0)] 

.109 

.159 

.184 

.215 

.306 

p[F(2,l)] 

.042 

.059 

.091 

.114 

.143 


Table 1: Reduction factors for the asymmetric -sudden- expansion test problem; 
Re = 16.67; Re h = 100. 


Convergence factors for the finest mesh for the same range of Reynolds numbers are 
presented in Table 2. This reveals slower rates of convergence; nevertheless, these rates are 
still better than those for segregated solution methods. In Table 3, typical values for p are 
given at four different levels of refinement at just three selected Reynolds numbers. 

The convergence performance would appear to be better than that achieved by Dick and 


Re 

16.67 

50 

100 

150 

200 

Re h 

100 

300 

600 

900 

1200 

P 

.426 

.587 

.684 

.754 

.816 


Table 2: Convergence factors for the asymmetric-sudden-expansion test problem; 
level of refinement q=8; number of unknowns = 8151. 


N 

2133 

3273 

4656 

8151 

q 

4 

5 

6 

8 

p(Re=16.7) 

.464 

.432 

.426 

.426 

p(Re=50) 

.602 

.608 

.587 

.587 

p(Re=150) 

.911 

.807 

.771 

.754 


Table 3: Convergence factors for the asymmetric-sudden-expansion test problem; 
N = number of unknowns; q = level of nested refinement. 


Linden [10], who obtained second-order accurate, coupled solutions to the same test 
problem discretised using a flux-difference splitting approach. They also used a defect- 
correction scheme, but their solver was based on a geometric (FAS) multigrid method. 
Their published result for the case corresponding here to Re = 100 was p = 0.81, which 
compares with p = 0.68 in Table 2. Dick and Linden also reported a deterioration in 
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convergence performance with mesh refinement, which has not been observed in this work. 
The evidence is for constant or improving convergence rates with mesh refinement 
(Table 3). 


Navier-Stokes Performance. 

The axial extent of the recirculation eddy following the step expansion will be used as 
the gauge for assessing the quality of the solutions. Experimental data is available, but not 
for a truly parabolic inlet velocity profile. Predictions of the experiment would have to be 
based, therefore, on the measured profile, which is known to result in a short eddy. Since 
over-diffusive calculational methods would tend to underpredict the eddy length anyway, 
there could well be fortuitously good first-order calculations of this experiment wherever a 
parabolic inlet velocity profile has been mistakenly used. Here such complications are 
avoided by assessing the performance against other calculations of the idealised problem 
only. Thus the results are compared with the higher-order accurate calculations of Hutton 
and Smith [1 1] and with the first and second-order accurate calculations of Shaw [12]. 

For Reynolds numbers up to Re = 200, the resolution requirement should be satisfied for 
the mesh specified in Figure 4 (for q = 8). Results for the range Re = 16.7 to Re = 200 are 
given in Figure 5 as the 5 filled-circle data points. For comparison, two sets of data from 
Hutton and Smith are plotted, one as a continuous curve, which was obtained using a 
coarse mesh of 69 biquadratic rectangular elements (246 nodes), and the other as 4 open- 
circle data points obtained using a finer mesh of 256 quadratic triangular elements 


ASYMMETRIC SUDDEN EXPANSION 
Recirculation eddy length versus Reynolds number 



Figure 5: Length of the recirculation eddy versus Reynolds number: a comparison with the 
published results of Hutton and Smith and Shaw. 


767 




(565 nodes). The agreement is within 2% in all cases. 


Five open-square data points from the calculations of Shaw, using 600 rectangular linear 
elements are also shown for the Reynolds number range Re = 12.5 to Re = 100. The two 
lower points at Re = 12.5 and Re = 25 are second order accurate and are consistent with the 
other data. The remaining three points were obtained using a first-order scheme for 
advection. They underpredict the length of the recirculation by as much as 27% at 
Re = 100. Shaw attributed this to the coarseness of the mesh and the false numerical 
diffusion associated with the first-order upwind scheme. 


DISCUSSION AND GENERAL COMMENTS 

The above results give a representative sample of the tests to which the solver has been 
applied. On the basis of all tests, the following general comments are made and the 
subsequent conclusions drawn. 

It has not been found necessary to use any underrelaxation of variables to ensure 
convergence of the linear solver. The rates of reduction of the residual errors within inner 
iterations are typical of those to be expected for the PGS-based relaxation methods used and 
the simple inter-grid transfer operators being exploited. Note that, from the point of view of 
the coarse grid approximation, the values quoted are for the worst Navier-Stokes cases; 
those with low Reynolds numbers. They are nevertheless more than adequate for the 
problems attempted. The weak dependence of p on mesh size is an inevitable consequence 
of the primitive inter-grid transfer operators used. However, it is sufficiently weak to have 
little if any impact on the scaling of p. A higher order interpolation would be required for a 
better coarse grid approximation, and this is unlikely to be cost effective. 

Providing the computational mesh has a sufficient resolving power for the problem, rapid 
convergence superior to that possible with segregated solution methods is achieved. When, 
however, the mesh has insufficient resolution the convergence can stall (p — > 1) unless an 
explicit underrelaxation of velocity is exploited. This is thought to be due to the influence 
of the dispersive truncation error on the convergence process. For finer meshes, explicit 
relaxation is not required and rates of convergence improve with refinement, asymptotically 
approaching mesh-independent values as the resolution is increased (i.e., |3 — » 1 as 
Q — > oo). No evidence has been found for P > 1 in any applications so far. If this proves to 
be a better performance than that achieved with other defect-correction multigrid 
algorithms, the accuracy of the present discretisation may be responsible. 

CONCLUSIONS 

An efficient and robust iterative numerical method is presented for solving the coupled 
equations of motion for viscous fluids in the discrete second-order approximation. 
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Provided that discretisation has sufficient spatial resolution for the flow field, a rapid 
convergence to machine accuracy is achieved that is almost mesh independent insofar as 
the convergence rates either improve or are maintained for increased nodal concentration. 

With sufficient resolution, the method is also robust to the extent that no underrelaxation 
of flow variables has been required to ensure convergence. However, small amounts of 
underrelaxation can improve convergence rates. Converged solutions can also be obtained 
when the mesh resolution is insufficient to resolve the flow field, but in the more extreme 
cases of low resolution some explicit underrelaxation is necessary to prevent a stalling of 
the outer-iteration convergence. 

The discretisation provides accurate solutions on relatively coarse meshes. This is 
probably due to the interpolation scheme used for the momentum flux within elements, 
which is based on a local discrete solution of the equations of motion within the element. 
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Abstract 

In this paper we describe some classes of multigrid methods for solving large 
linear systems arising in the solution by finite difference methods of certain 
boundary value problems involving Poisson’s equation on rectangular regions. 
If parallel computing systems are used, then with standard multigrid methods 
many of the processors will be idle when one is working at the coarsest grid 
levels.We describe the use of multiple coarse grid multigrid (MCGMG) meth- 
ods. Here one first constructs a periodic set of equations corresponding to the 
given system. One then constructs a set of coarse grids such that for each grid 
corresponding to the grid size h there are four grids corresponding to the grid 
size 2*h. Multigrid operations such as restriction of residuals and interpola- 
tion of corrections are done in parallel at each grid level.For suitable choices 
of the multigrid operators the MCGMG method is equivalent to the parallel 
superconvergent multigrid (PSMG) method of Frederickson and McBryan. The 
convergence properties of MCGMG methods can be accurately analyzed using 
spectral methods. 


1 Introduction 


In this paper we describe some classes of multigrid methods for solving large linear 
systems arising from the numerical solution by finite difference methods of certain 
boundary value problems involving Poisson’s equation 
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Uxx Uyy — f { x i I/) (IT) 

on rectangular domains. Here f{x,y) is a given function. The solution u(x,y) of 
(1.1.) is required to satisfy the Dirichlet condition 


u{x,y) = g(x,y) (1.2) 

on the boundary. The standard 5-point finite difference equation is used to derive a 
linear system of the form 


Au. = b (1.3) 

Standard multigrid methods often exhibit excellent convergence rates on sequen- 
tial computing machines. However, if parallel machines are used, many of the proces- 
sors will be idle when the program is working on the coarse grid levels. Frederickson 
and McBryan [3] developed and analyzed a method, called the “parallel superconver- 
gent multigrid (PSMG) method.” With the PSMG method the same number of grid 
points are used and more of the processors are used at all grid levels. For other works 
dealing with the idea of using more than one coarse grid to speedup convergence cf. 

[2], [4], [6], [9], 

In this paper we describe a class of multigrid methods which we refer to as “mul- 
tiple coarse grid multigrid methods” (MCGMG methods) where, as in the case of 
PSMG methods, more than one coarse grid is used at each coarse grid level. 

With a MCGMG method, one first constructs a periodic set of equations corre- 
sponding to the given system. One then constructs a set of coarse grids such that for 
each grid corresponding to the grid size h there are four grids corresponding to the 
grid size 2 h. The actual number of coarse grids depends on which coarsening scheme is 
used. There are many ways to choose the multigrid operators for a MCGMG method. 
For suitable choice of the operators the MCGMG method is equivalent to the PSMG 
method of Frederickson and McBryan. The convergence properties of MCGMG meth- 
ods can be accurately analyzed using spectral methods; see, e.g., [7]. The analysis of 
many other iterative methods based on such a periodic set of equations can be found 
in, e.g., [1], [5], [8]. 

In Section 2, we derive Dirichlet problems and construct related discrete periodic' 
problems corresponding to (1.1) and (1.2). In Section 3, we apply a procedure to 
derive a discrete periodic problem corresponding to a discrete Dirichlet problem. 
In Section 4, we discuss the use of MCGMG methods for solving discrete periodic 
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problems. In Section 5, we show that a certain choice of multigrid operators can make 
a MCGMG method equivalent to some well known parallel multigrid methods. We 
also give convergence factors for the MCGMG methods and the standard multigrid 
methods for discrete Dirichlet problems. 

It should be noted that the methods described in the paper have only been shown 
to apply to problems involving Poisson’s equation on the rectangle with Dirichlet 
boundary conditions. However, it can be shown that with slight modifications, the 
method also applies to problems involving Neumann boundary conditions. 

As pointed out by the referee, the methods used in the present paper are closely 
related to more general methods based on the use of symmetries; see for example [3] 
and the references given therein. 


2 Discrete Dirichlet Problems and Discrete Peri- 
odic Problems 


In this section we consider classes of discrete Dirichlet problems and discrete peri- 
odic problems in one and two dimensions. First, we consider the Dirichlet problem 
involving the differential equation 

-u" = f(x) 0 < x < 1 (2.1) 


and the boundary conditions 


u(0) = a , u(l) = (3 


(2.2) 


To define a discrete Dirichlet problem we choose an even positive integer N and the 
grid size h = N~ x and seek a function u(x) defined on the points x = 0, fi, 2 h, . . . , Nh 
such that 

2u(x) — u(x + h) — u{x — h) = h 2 f(x ) 

< x = h,2h,...,(N -l)h (2.3) 

u(0) = a, u( 1) = (3 

For the case N = 4, this leads to the linear system 


2 -1 

0 


u(xi) 


h 2 f(x x ) + a 

-1 2 

-1 


u(x 2 ) 

= 

h 2 f(x 2 ) 

0 -1 

2 


u(x 3 ) _ 


h 2 f(x 3 ) + (3 


(2.4) 
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Since the matrix of the system (2.4) is nonsingular, a unique solution exists for any 
a, P and f(x). 

Let us now consider a periodic problem with period P — 1 based on (2.1). We 

require that u(x) be periodic with period P and that (2.1) holds for all x. We also 

require that f(x) be periodic with period P and that 

[ P f(x)dx = 0 (2.5) 

Jo 

We now define a discrete periodic problem as follows. We require that u(x) be 
periodic of period P on grid points 0, ±/i, ±2 h , . . ., and that u(x) satisfy 

2 u(x) — u(x + h) — u{x — h) = h 2 f(x), x = 0, ±2/i, . . . (2-6) 


We also assume that f(x) is periodic of period P and that, instead of (2.5), we have 

N - 1 

E fi = o < 2 - 7 ) 

o 

where h — P/N and where fj = f(xj ) , j = 0, 1, . . . , /V — 1 and Xj — jh. 

To actually solve the periodic problem defined by (2.6) it is sufficient to consider 
a finite subset of points. Thus in the case M = 4 we have 

2u 0 — u_ i — «i = h 2 f 0 
2i/i — uq — U2 — h 2 fi 

2 u 2 - ui - u 3 = h 2 f 2 (2-8) 

2 u 3 — u 2 — u 4 = h 2 f 3 

2U 4 ' U3 “■ U5 — Jl ^4 

where u\ = u(jh) and fi = f(jh). By periodicity we have u_i = u 3 and u 5 = «i. 
Thus we obtain the system 





h 2 f{x 0 ) 
h 2 f{x 1 ) 
h 2 f(x 2 ) 
h 2 f{x 3 ) _ 


(2.9) 


It can be shown that the matrix of the above system is singular and the rank is 
fV — 1 = 3. Since the null space of A is spanned by the vector (1 1 1 1) T and since 
the system is consistent by (2.7), it follows that (2.9) has a solution which is unique 
to within an additive constant. 
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For general M, the eigenvalues of the operator defined by the left member of (2.6) 


are 


u s = 2 — 2 cos(2s7t/i); 
and the corresponding eigenvectors are 


,0 


} (x) 


27 risx . 


0,l,...,N-l 


s = 0, 1, . . . , N — 1 


( 2 . 10 ) 

(2.11) 


For the two-dimensional case we first consider the Dirichlet problem involving the 
Poisson equation 


-u xx - Uyy = f(x , y) 0 < x < 1; 0 < y < 1 (2.12) 

with 

u(x,y) = g(x,y) (2.13) 

on the boundary of the square 0<x<l,0<y<l. To define a discrete Dirichlet 
problem we choose a positive integer N and the grid size h = iV _1 and we seek a 
function u(x, y) defined on the grid points (jh, kh), j,k = 0,1, ..., N such that 


4 u(x, y) - u(x + h, y) - u(x - h, y) 

-u(x,y + h) — u(x, y — h) = h 2 f(x,y) 
x,y = h , 2 h, ... ,(N — l)h 
u{x, y) = g(x,y) 

x = 0 and x = 1; y = h, 2 h, . . . , ( N — 1 )h 
y — 0 and y = 1-; x — h, 2 h, . . . , (. N — 1 )h 

Using (2.14) one obtains a linear system of the form 

Au = b 


(2.14) 


(2.15) 


where A is an (N — l) 2 by ( N — l) 2 matrix. As in the one-dimensional case, the 
matrix A is nonsingular; hence, a unique solution to (2.15) exists. 

As in the one-dimensional case we can define a discrete periodic problem with 
periods P = 1 in both the x-direction and the y-direction. We require that 

4 u(x, y) - u(x + h,y) - u(x - h , y) 

-u(x, y + h ) - u(x, y-h) = h 2 f(x , y) 



for x,y = 0,±h,±2h,. . .. Also, we assume that f(x,y ) is periodic with period P in 
x and y and that 

AT— 1 N - 1 


EE/W“) = 0 (2.17) 

j=0 k = 0 
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It can be shown that if (2.17) holds then a solution to the discrete periodic problem 
defined by (2.16) exists and is unique to within an additive constant. Moreover, the 
eigenvalues and eigenvectors of the discrete operator defined by the left member of 
(2.16) are, respectively, given by 

i/ Si t = 4 — 2 cos(27r sh) — 2 cos(27t th) (2.18) 

and 

v^' t \x,y) = e 2 ™ x e 2irity \ s, t = 0, 1, . . . , IV — 1 (2.19) 


3 Construction of Discrete Periodic Problems 


In this section we describe a procedure for constructing a discrete periodic problem 
corresponding to a given discrete Dirichlet problem of the type defined in Section 2. 


We will illustrate the procedure for a problem in one dimension with h = 1/4 and 
M = 4. The procedure for the two dimensional cases is similar. From (2.4) we obtain 
the system 


2 -1 

-1 2 

0 -1 



Ui 


' 6l ' 


U2 

= 

&2 


U3 


b 3 


(3.1) 


where /*• = f(xi), i — 1,2,3 and 


h = h 2 fi + a; 

< &2 = h 2 f2 ( 3 - 2 ) 

b 3 = h 2 f 3 + (3 

V 


We now define fe; for i = 0, ±1, ±2, ... as follows: 

0 = So — ^4 = b - 4 = bs = fe- § = . . . 

b\ — b\ = — b—i — bf — 7 = 69 = — 6_g = . . . 

< &2 = 6 2 = —b- 2 = — f>6 = b- 6 = 610 = —6-10 = • • • ( 3 - 3 ) 

= ^3 = —b - 3 = —65 = 6_ 5 = fen = —6_ii = . . . 


Clearly we have bj +s = bj for j = 0, ±1, ±2, . . . and 

r+7 

E h = 0 (3.4) 

3 = 3 * 
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for j* = 0, ±1, ±2, — 

We now consider the system 

2wj — Wj +1 — tOj_i = bj, j — 0, ±1,±2, ... (3.5) 

where we require that 

Wj +S — Wj, j — 0, ±1,±2, ... (3.6) 

It is easy to show that a necessary and sufficient condition that w is a solution of 
(3.5) - (3.6) is that w is a solution of the system 



It is also easy to show that the rank of the matrix of the system (3.7) is 7 and that 
the null space is spanned by the vector (1111111 1) T . Therefore, because of (3.4) 
the system is consistent and has a solution which is unique to within an additive 
constant. 

It should also be noted that if 

Ui 

u = u 2 (3-8) 

u 3 

is a solution of the original system (3.1) then u is a solution of the expanded system 
(3.7) where 



in 



Let w be any solution of the expanded system (3.7). Then since (3.7) has a unique 

solution to within an additive constant it follows that for some constant c 

r 

U\ = Wi + c 

< U 2 = w 2 + c (3.10) 

u 3 = w 3 + c 

If one requires that the sum of the components of w vanish, then w = u must hold, 
since the sum of the components of u vanishes. 

We remark that the process of replacing a vector to by a vector w' = w + c such 
that the sum of the components of w 1 vanishes is referred to as purification. Thus, if 
w is a vector of order N and if w 1 is given by 

1 N 

w i = w i-J^Yl w i ( 3 - U ) 

v j=i 

for i — 1, 2, . . . , N, then w 1 is the purified vector corresponding to w and we let 

w’ = V{w) (3.12) 


4 Multiple Coarse Grid Methods 

4.1 One Dimensional Case 

Let Xj = jh with h = 1/N and 

Slh = {xj\j = l-N,...,N}. (4.1) 

be a grid on the interval ( — 1,1], where N = 2 k for some positive integer k. We 
construct two coarse grids in such a way that all the even-numbered grid points 
belong to one coarse grid and all the odd-numbered grid points belong to another. 
Then, we have 


Q- — {xj | Xj £ flh and (j = even)}, (4-2) 

= {xj | Xj G £lh and (j = odd)}. (4.3) 

Figure 1 illustrates the grids on two levels, h and 2 h for the case N = 4. 

A two-level MCGMG algorithm for the above problem is given in Figure 2. For 
the following analysis, we assume that the full weighting restriction of residuals and 
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Figure 1: Two- Level Grids in ID with h = 1/4 


Algorithm: UCGMG2L{A h ,u ( ° ) ,b h ) 


1. Do mi pre-smoothing iterations using the smoothing iterative method (e.g., 
damped Jacobi method) to obtain u' h . 

2. Compute the residual rh — bh — Ahu' h , restrict the residual onto the coarse grids 
and perform purification defined in (3.11) if necessary to obtain 

4V = rfe' = nRVn) 

where and are the eigenvectors in the null spaces of A^t' and , 
respectively. 

3. Solve the coarse grid systems 

/t(+)x(+) _ r (+) — r 

A 2 h °2h — r 2h 1 A 2 h °2h ~ r 2h 

to obtain the purified solutions and & 2 h • 

4. Interpolate 8$ and 8$ onto the fine grid to obtain the new approximate 
solution 

< = < + i(d +) 4t ) + d'W)- 

5. Do m 2 post-smoothing iterations using the smoothing iterative method and 
purify the result, if needed, to obtain . 


Figure 2: The ID Two-Level MCGMG Algorithm 
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linear interpolation of corrections are used. The full weighting restriction is defined 

by 


(R^r h )(x) = 


\{r h (x - h) + 2r h (x) + r h (x + h)) x <E 0+ 


x £ fL 


(4.4) 


(R[ y r h )(x) 


0 x £ 0-j 

\{r h (x - h) + 2 r h (x) + r h (x + h)) x € 


and the linear interpolation is defined by 


(4.5) 


(Pf'hMx) 


$2h(x) 


x E fL 


-,{$2h{x — h) + S 2 h(x + h )) 


(4.6) 


(pMfe.K*) = ( 


\{5 2 h{x -h) + 8 2h (x + h)) x e 0+ 

s 2 h(x) x e o_ 


(4.7) 


The coarse grid difference operators are defined by the 3-point difference formula, 

e-g-, 


UMfKU = (2h)-^\x)-5^(x-2h)-S l i\x + 2h)] 

x € S1+ (4.8) 

( 474 ’)(^) = (2h)- 2 {2Sk\x)-5k\z-2h)-6ti(x + 2h)\ 

x G (4.9) 

The 2fi coarse grids can be divided into even coarser grids in a similar way. Figure 3 
illustrates all the grids on three levels, fi, 2 h and Ah for the case N = 4. Figure 4 
shows the corresponding hierarchical relations among these grids. 

A multilevel MCGMG algorithm is similar to the two-level version except the 
coarse grid problems in step 3 are solved by using algorithm MCGMG2L recursively. 
For a better understanding of the multilevel MCGMG algorithm, we list a three-level 
MCGMG algorithm in the following. For convenience of representation, we use the 
symbol v instead of 5 to represent the solutions and b to represent the right-hand side 
vectors on all levels. The solutions on coarse grids should be thought of as corrections 
to the solution of the fine grid. 
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Algorithm: MCGMG1D3L(A a , u ( h °\ b h ) 


1. Do m! smoothing iterations on AhUh = bh with initial guess Vh- 

2. Compute 

4 ’ = m^n), 4 * = P(Hi-> rh ) 

3. Do mi smoothing iterations on 

/i 2fc U 2h — °2h > A 2h U 2h ~ °2h 
with initial guesses = 0 and v^h = 0. 

4. Compute 

4 +) = 4 _) = p(4r ) 4t > ) 
4 +) = i>ir’ = nMr’4 1 ) 


5. Solve 


4 (++)..(++) 
A 4h U 4h 

/l(-+) (-+) 

n 4 h u 4h 


°4 h ) A 4h U 4h — °4h 


l(-+) 
■ ) 4h ) 


A — ),.( — ) _ v 

A 4 h U 4h ~ °4h 


4h 
(— ) 


6. Correct 


4’ <- 4’ + i(4 ++) 4 +) + pS''” lt _) ) 


V 2h 


(-) 

2/i 


+ o(^ 


(-+)„(-+) 

2h V 4h 


+ P, 




2h u 4h 


) 


7. Do m 2 smoothing iterations on 

/((+),.(+) _ /,(+) >*(—)--(—) _ u(~) 

A 2 h U 2h — °2h > A 2 h U 2h ~ °2h 

with initial guesses v $ and z4 h\ respectively, and purify the results if necessary. 

8. Correct 

**«-<* + j(d +) «tt ) + d _) 4 ) ) 

9. Do m 2 smoothing iterations on AhUh = bh with initial guess Vh and purify the 
results if necessary. 
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Here we used the purification notation V(v,z) defined in (3.11) and (3.12). In the 
case of N = 4, the two 2 h coarse grid systems on the second level are given by 


and 


a (+),.(+) _ 
A 2h V 2h — 


(2 hf 


2 -1 

-1 2 

0 -1 

-1 0 


0 

-1 

2 

-1 


-1 



(V2h)~3 


(V2h)-1 


(V2h)l 


(V2h ) 3 


(b 2 h)-3 


' (&)-• ' 

(b 2 h)-i 


(iff). 

{b 2 h)i 


(&’)i 

(b 2 h)3 


or 
to - 7 ' 

S- + 

to 


~ °2h 


i(-U-) 

l 2 h V 2h 



2 -1 0 -1 ’ 


(V2h)~2 

1 

-12-10 


(V2h)0 

(2 hf 

0-12-1 


(V2h}2 


CM 

r-H 

1 

o 

r-H 

1 

1 


to 

! 


(b2h)-2 


' (btiu ' 

( b 2 h)o 


( 42 )o 

(^2/1)2 


(&’). 

•JS 

(N 

l 


cr* 

to ' 

1 


= b{ 


2h 


(4.10) 


(4.11) 


Here we use v 2 h and b 2 h to represent the fine grid vectors which consist of the coarse 
grid vectors v^\ and b^\ b^\ respectively. 

On the third level, the four 4 h coarse grid systems are given by 


.(++)„(++) _ _L 


Hh 


J 4h 


(4 hy 


-2 2 


(V4h )~ 3 
{V4h)l 


(&4/i)-3 


’ (*>S +, )o " 

(&4/i)l 




H++) 

°4h 5 


(4.12) 


/«(+-)„.(+-) 
n 4h V 4h 


(4 hy 


2 -2 

-2 2 


rH 

1 

-0 

L..„. — 


’ (f>lt _) )o ’ 

(^4/1)3 




(^4/i)-l 
{V4h ) 3 

— °4h > 


(4.13) 
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2 -2 

-2 2 


{V4h)~2 

{V4h)2 


cs 

i 

Tf 

1 


1 

O 

+ 

1 -e 
*0 

(&4/i)2 


1 

+ 

1 < 
wTt* 

rO 

1 


(4.14) 


A 


(—)„(—) _ 1 

to 

1 

to 


(v 4 h ) 0 

4h 4h (Ah) 2 

1 

1 

to 

to 

1 - 


(V4h)4 



L 

J L 

(&4/i)o 



I 

1 


. (Ar ] ) 1 


(4.15) 


Here each of the fine grid vectors v±h and 64/1 consists of four corresponding Ah coarse 
grid vectors. On the third level, the grid points on a coarse grid are not always 
distributed symmetrically about zero. The systems (4.12) and (4.13) may not be 
consistent in general. However, one can make such a problem solvable by purifying 
the right hand vector. 


4.2 Two Dimensional Case 

In the two dimensional region [ — 1, l] 2 we can define a grid 

Q, h = {{xj,y k ) | j,k = 1 - TV, . . . , N} (4.16) 

where Xj = jh , yj, = kh and h = 1/Ah On this fine grid, the four coarse grids can be 
defined as illustrated in Figure 5 in the case of IV = 4. 

A two-level MCGMG algorithm in 2D is a straightforward extension of the cor- 
responding two-level MCGMG algorithm in ID defined in Figure 2. For a problem 
AhUh = bh with a given initial guess uj^\ a two-level MCGMG algorithm in 2D is 
given in Figure 6. 

As in the one dimensional case, a multilevel 2D MCGMG algorithm can be con- 
structed by recursively applying the two-level MCGMG method to each coarse grid 
system until the process reaches the coarsest grid level or some preset grid level. 
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(^++) 


(ft-+) 


Figure 5: Coarse Grid Points for a 2D Problem with h = 1/4 
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Algorithm: MCGMG2L(A fe , u^\ b h ) 

1. Do mi pre-smoothing iterations using the smoothing iterative method (e.g., 
damped Jacobi method) to obtain u' h . 

2. Compute the residual = bh — Ahu' h , restrict the residual onto each of the four 
coarse grids and perform purification if necessary to obtain 

r 2h = 'P(Rh ^ r h), s = ++, — h, H — , 

3. Solve the coarse grid systems 

_ r ( s ) . _ i I , , 

J± 2h°2h— r 2h) S ~ + + > r,H , , 

r r~ (») 

tor 02 h ■ 

4. Purify 5$ and interpolate the purified corrections 6^ onto the fine grid to 
obtain the new approximate solution 

$2h = 'PfyVi z 2h )> s = ++> — hd — , 

^ S 

5. Do m 2 post-smoothing iterations using the smoothing iterative method to obtain 
and return 

Figure 6: The 2D Two-Level MCGMG Algorithm 
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5 Further Discussion 


A special version of the MCGMG algorithm is determined by the selection of multigrid 
operations such as restriction of residuals and interpolation of corrections are done in 
parallel at each grid level. 

For instance, if one chooses a restriction operator defined by 

(R h S h )(x,y) = -^-(S h (x-h,y + h) + 2S h (x,y + h) + 6 h (x + h,y + h) 
lo 

+2 S h (x - h,y) + 4S h (x,y) + 2S h (x + h,y) 

+S h (x -h,y-h) + 2 S h (x, y - h) + 5 h {x + h,y - h)) 

(x,y)en h (5.1) 

and an interpolation operator defined by 


(Ph$ 2 h)(x,y) = S 2 h(x,y). 


(5.2) 


then one will get a MCGMG algorithm which is equivalent to the parallel supercon- 
vergent multigrid (PSMG) method of Frederickson and McBryan [3]. 

One can also construct a special version of MCGMG equivalent to the frequency 
decomposition multigrid (FDMG) method of Hackbusch [4] by defining the coarse 
grid matrices 

Am = R { h s) A h P ( h s) s = ++, — b , H — , (5.3) 

where the restriction operators are defined by 


»*2fc(*,y)-= ( R ( h +)r h){x,y ) 

= \{r h {x - h, y + h) + 2 r h (x, y + h) + r h {x + h, y + h) 

+ 2r h (x - h,y) + 4r h (x,y) + 2r h (x + h,y) (5.4) 

+ r h (x -h,y-h) + 2 r h (x, y - h) + r h (x + h, y - h)) 

(x,y) € 0 ++ . 

r 2 h(x,y ) = ( R ( h ~ +) r h )(x,y ) 

= | (~rh(x -h,y + h) + 2 r h (x, y + h) - r h (x + h,y + h) 

- 2r h (x - h,y) + 4r h (x,y) -2r h (x + h,y) (5.5) 

- r h (x — h,y - h) + 2 r h (x, y - h) - r h {x + h,y - h)) 

(x,y) € ft-+. 
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(5.6) 


r 2 h{x,y ) = {R^~ ] r h )(x,y) 

= \{~ r h(x -h,y + h)~ 2 r h {x, y + h) - r h (x + h,y + h) 

+ 2r h {x -h,y) + 4r h {x,y) + 2r h (x + h,y) 

- r h (x - h,y ~ h) - 2r h {x, y - h) - r h (x + h,y - h)) 

(x,y) G 0+-. 

r 2 h(x,y ) = (i?t __) r/ l )(x,y) 

= \( r h(x -h,y + h)~ 2 r h (x, y + h) + r h ( x + h,y + h) 

- 2r h (x - h,y) + 4r h (x,y) - 2r h (x + h, y) (5.7) 

+ r h (x — h,y — h) — 2 r h (x, y - h) + r h (x + h,y - h)) 

(x,y) £ fi— • 

and the interpolation operators Pjf'* are defined by 


h(x,y) ^ (P t h * +] S 2h )(x,y) 


< 


5 2 h(x,y) 

-(S 2 h(x — h, y) + S 2 h(x + h , y)) 
-^{8 2 h{x,y — h) + 8 2 h{x,y + h)) 

^( S 2h {x -h,y-h) + 8 2h (x -h,y + h) 
+5 2h (x + h,y-h) + 5 2h (x + h,y + h)) 


(x,y) £ n ++ 
(x,y) £ 0_+ 
(x,y) £ 0+_ 

(x,y) G SI— 


(5.8) 


8h(x,y) = ( Ph +) 8 2 h)(x,y) 

8 2 h{x,y) 

-y-( S 2h (x -h,y) + 5 2h (x + h, y)) 

= < |(^2 h{x,y~h)+S 2h {x,y + h)) 

S 2 h(x -h,y -h) + S 2 h{x -h,y + h) 

+& 2 h{x + h,y - h) + S 2h (x + h,y + h)) (x,y) £ H + _. 


(x,y) £ 0_+ 
(x,y)£ 0 ++ 
{x,y) £ 0__ 


(5.9) 
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Table 1: Observed Numerical Convergence Factors 


(mi,m 2 ) 

MCGMG 

SMG 

(0.1) 

0.15 

0.53 

(1,1) 

0.11 

0.36 

(1,2) 

0.08 

0.23 


fih(x,y) = (Ph + ) S 2 h )(x,y) 
/ 


^2h(x,y) 

(x,y) e fl+- 

|( & 2 h{x -h,y) + S 2h (x + h, y)) 

(x,y) e n__ 

~^-{$ 2 h(x,y -h) + S 2h (x, y + h)) 

(x,y) e n++ 

-^-(S 2h (x -h,y-h) + 5 2h (x -h,y + h) 


+S 2h (x + h, y - h) + S 2h (x + h,y + h)) 

(x,y) e 0_ + 

( Pt~ ] 52h){x,y ) 


$ 2 h(x, y) 

( x,y ) e fl— 

-Y^ 2 h(x - h, y) + S 2h (x + h, y)) 

(x, y ) e n + _ 

y-h) + S 2h (x , y + h)) 

(x,y) e ft_+ 

^2 h(x - h,y- h) + S 2h (x -h,y + h) 


+$ 2 h(x + h,y -h) + S 2h (x + h,y + h)) 

(x,y) <E 0++. 


(5.10) 


(5.11) 


corresponding to the four coarse grids fl++, fi_+, fi + _ and 0 , respectively. 

We used the MCGMG method to solve a test problem defined by (2.12) to (2.15) 
with the boundary function g(x,y) = 1 + xy and grid size h = 1/64. The restriction 
operators and the interpolation operators are defined by (5.1) and (5.2) respectively. 
A damped Jacobi method is used for smoothing with the damping factor 0.8. For 
comparison, we also ran the same problem using standard multigrid method with full 
weighting restriction of residuals and the bilinear interpolation of corrections. Table 
1 lists the observed convergence factors which are the average values of 3 cycles. The 
number of grid levels is 6. m i and m 2 are number of pre smoothing and number of post 
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smoothing respectively. The results indicate that the observed convergence factors 
of a MCGMG method are much smaller than the corresponding ones of standard 
multigrid method. 
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SUMMARY 

The nonlinear multigrid is an efficient algorithm for solving the system of nonlinear equations 
arising from the numerical discretization of nonlinear elliptic boundary problems [7], [9]. In this 
paper, we present a new nonlinear multigrid analysis as an extension of the linear multigrid 
theory presented by Bramble, et al. in [5], [6], and [17]. In particular, we prove the convergence 
of the nonlinear V-cycle method for a class of mildly nonlinear second order elliptic boundary 
value problems which do not have full elliptic regularity. 

INTRODUCTION 


Multigrid methods have been used extensively to solve linear systems of equations which arise 
in the numerical discretization of linear partial differential equations. We call such multigrid 
methods “linear multigrid methods” in this paper. With the development of the linear multigrid 
methods, the multigrid technique also has been applied to the numerical solution of nonlinear 
boundary value problems. Two important algorithms have been proposed so far. One is Newton- 
multigrid iteration, in which a linear multigrid method is used to solve the linear system that 
arises from a Newton iterative method [4]. The other one is the nonlinear multigrid method, 
which is an extension of the linear multigrid method to the nonlinear case [9]. In literature, it 
is also referred to as the Full Approximation Scheme (FAS) by Brandt in [7]. The convergence 
of the nonlinear multigrid method was first studied by Hackbusch in [9] and later by Reusken 
in [11] and [12]. Hackbusch’s nonlinear multigrid theory is based on his linear multigrid theory, 
while Reusken’s analysis is based on the linear multigrid analysis in [3]. 

Recently, Bramble, et al. have established a new linear multigrid theory [5] [6] [17] that has 
generalized the work in [3] and [9] in another way. Using this new multigrid theory, they have 
proved the convergence of linear multigrid methods with non-nested spaces or non-inherited 
quadratic forms, even with weak or no regularity assumptions. The purpose of this paper is to 
extend this new linear multigrid theory to the nonlinear case. 

In this paper, we present the framework of our new multigrid theory. In particular, we prove 
a basic convergence theorem for the nonlinear V-cycle scheme based on two abstract conditions, 
which are referred to as the “smoothing assumption” and the “approximation assumption”. 

’This work was supported in part by the National Science Foundation through award number DMS-9105437 
at the University of Houston. 
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We then apply it to show the convergence of the nonlinear V-cycle method with the damped- 
Jacobi-Newton smoother for a class of mildly nonlinear second order elliptic boundary valiie 
problems which do not have full elliptic regularity. Moreover, our new approach makes it possible 
to analyze the nonlinear multigrid method in more complicated cases, such as, nonnested 
spaces, non-inherited quadratic forms, numerical integration, and with weak or no regularity 
assumptions. We have shown the convergence of the nonlinear V-cyde method disturbed by 
numerical quadratures in [14]. We intend to study other cases in subsequent work. 

In comparison to the linear multigrid method, the nonlinear multigrid method has two ad- 
ditional parameters. In practice, their choice is an important issue. We investigate this issue 
numerically through a model problem in this paper. We note that this model problem, in part, 
aids in the understanding of the solution procedures used in the code UHBD [10]. 

The outline of the remainder of the paper is as follows. In Section 2, we introduce the basic 
idea of our nonlinear multigrid analysis. In Section 3, we present a general convergence theorem 
of the nonlinear V-cyde method based on two abstract assumptions, the smoothing assumption 
and the approximation assumption. In Section 4, we apply the theory of Section 3 to show the 
convergence of the nonlinear multigrid method for a class of mildly nonlinear elliptic boundary 
Value problems. In Section 5, we present numerical experiments with the nonlinear multigrid 
method focusing on its two auxiliary parameters. 

THE NONLINEAR MULTIGRID METHOD 


We consider a nonlinear variational problem coming from a nonlinear elliptic boundary value 
problem with domain Vt as follows: Find u £ H , such that 

a(u, n) = 0 Vu £ H, (1) 

where H = H(Cl) is an abstract Hilbert space with inner product (•, *), and a(-, •) is nonlinear 
only with respect to the first variable. 

We assume that a(u,v) is //-bounded, that is, there exists a constant C , such that 

|a(u,u)| < C(1 + ||u||)||v|| Vu,v £ H, 

where ||u|| = yf( u, u). Using the Riesz representation theorem [1], we then write (1) as 

g(u) = 0, (2) 

where g : H -+ H is the nonlinear operator such that 

a(u,v) s= (g(u),v) \/v£H. 

We make another assumption on g below: 

Al) g is Frechet-differentiable on H , and the derivative of g at u, denoted by Dg(u), is a 
symmetric, positive definite, bounded linear operator from H to itself. 

From Al) it follows that Equation (2) has the unique solution u* [16]. 
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Let U C H be a neighborhood of u* and T be the image of U under g. Since g satisfies 
the above assumptions, the implicit function theorem [1] implies that g : U T is a homeo- 
morphism. Thus, for any / € T, there exists unique u € Z7, such that the following equation 
holds: 

9{u) = /■ (3) 

Hence, we may consider equation (3) in the following. 

Let u old be an approximate solution of (3)i The update u new of u old is defined by 

„,net y „,oM i „ 

U = U + q, 

with q being a correction term satisfying the following correction equation of u old : 

g(q + u old ) = /. (4) 

If q is an exact solution of (4), then a direct method for solving (3) is derived. But solving (4) 
is as difficult as solving (3), so we often construct an approximate operator R of p" 1 to simplify 
the computational work. 

In the linear case, the correction equation (4) is often written as 

s(») = /-sK“), (5) 

and the term / — g(u old ) is often referred to as the residual of u old . Clearly, if the operator R is 
defined by a linear iterative algorithm, then the linear iteration can be written as follows: 

U new = u old + _ g{u old )\ . (6) 

A key factor in the new linear multigfid theory in [5], [6] and [17] is the introduction of the 
operator R that characterizes the linear multigrid method, so the linear multigrid method can 
be expressed in form (6). 

However, when g is nonlinear, the correction equation (4) cannot be written as (5). Noting 
the important role of the residual term in the context of the multigrid method, we introduce an 
“approximate” correction equation of (4) as follows: 

g(sq + u) = f + s[f - g(u old )}, (7) 

where / = p(u), s is a given positive number and u a given vector. Both s and u are extra 
parameters, compared to the linear multigrid method, and they are chosen so that q approximates 
the solution q of (4) in some sense. Hence, the nonlinear multigrid method can be expressed by 

u— = u° ld + [R(f + s[f - g(u old )\) - u ] /s, (8) 

provided that the operator R is defined by the nonlinear multigrid iterative algorithm for solving 
g(u) — /. This is the main idea of our nonlinear multigrid analysis. 

In the linear case, we can simply set u ~ f = 0 and s == 1. Thus, (8) reduces to (6). In 
this sense, the nonlinear multigrid method defined by (8) is an extension of the linear multigrid 
method. 

To define a nonlinear multigrid operator, we need some further notation given below. 
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Let H be a finite element space with grid size h. Suppose that we have subspaces M k with 
inner product (•, •)*, satisfying 

M x C M 2 C • • • C Mt = H. 

Set gi = g, and define the nonlinear operator gk : M k — > Mk by 

(g k (u),v)k = a(u,v), Vu G M k , k = 1,2, •••,/- 1. (9) 

We define a projector Q k : M k + 1 -> M k by 

(Q k u, v) k = (u, v) k+ i, Vu G M k . 

Obviously, g k satisfies Assvunption Al), so there exist U k and T k such that g k is a homeo- 
morphism between them. Hence, for f k G Fk, we may consider the following equation 

9k{u) = f k , (10) 

and its solution is denoted by u* k . 

The smoothing process on M k is denoted by the operator 

S?(-J k ):M k ->M k (11) 

satisfying u* k = S™(u k ] f k )- We assume that S™ is Frechet-differentiable on M k . Here m indicates 
that S™ may be defined by m steps of a nonlinear relaxation iteration (e.g., the damped- Jacobi- 
Newton or the Gauss-Seidel-Newton [13]). Without confusion, we denote ^(u; f k ) as S™^). 

Denote E k = {C | C = fk + $k[fk ~ 9k(uk)} for all f k G M k }. Here u k , s k and u k are fixed, and 
fk = gk(uk )• We define the nonlinear multigrid operator B k on E* inductively in the following 
algorithm: 

Algorithm 1 Given positive integers mi, m 2 and p. 

0) B\ = gf 1 . 

For each ( k € S* with k > 1, there exists an f k G M k such that ( k = fk + s k[fk~ 9k(uk)]- 
We define B k (Ck ) in terms of B k -\ as folio ws: 

1) Pre-smoothing : v\ = S^ 1 (u k \ f k ). 

2) Coarse grid correction: u 2 = Vi + — — — 

Sfc — 1 

where q p is defined by (12). 

9i = 9%-i + [ B k -i(fk-i + s k -i[fk-i — 9k-i(qi-i)]) — Uk-i] /sfc-i, (12) 

for i = 1, 2, • • • ,p. Here qo = u k -\, and 

fk - 1 = fk - 1 + s k -iQ k -i[fk — gk(y 1 )]. (13) 

3) Post-smoothing : 

B k ((k) = sjb[5** a (u2; fk) ~ u k ] + u k . (14) 
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We note that Algorithm 1 using u k = u k = 0, s k = 1, and p = 1 reduces to the linear 
multigrid algorithm described in [5], [6] and [17] provided that g is linear. 

THE CONVERGENCE ANALYSIS 

In our nonlinear multigrid analysis, we need a new inner product bk(u,v) defined by 

b k (u,v) = (Dg k (u* k )u,v) k , Vu,u 6 M k . 

From Assumption Al) we see that b k (u,v ) is symmetric, positive definite. 

With this new inner product, we define an orthogonal operator P k : M k+ i -4 M k by 

b k (P k u,v) = b k+i (u,v) Vu € M k . 

From the definitions of Q k and P k an important equality follows: 

Q k -iDg k {u* k ) = Dg k -i{u k _ x )P k -i, k = 1,2, •••,/. (15) 

Using the nonlinear multigrid operator B k , we define the nonlinear multigrid method as 
follows: 

u k +1 = tpk(u k ) j = 0, 1,2, •••, (16) 

with the operator x[> k : M k -4 M k being defined by 

V’fc('^fe) = ^k T B k (f k H~ $k[fk 9k^k)\) fifc /s k . (1^) 

Noting that g k (u k ) = f k and S™'(u k ] f k ) = u k for i = 1,2, we can show by induction that 

B k (fk) = u k . (18) 

Thus, the scheme (16) is consistent in the sense that u* k is a fixed point of the sequence {u k }. 

A fundamental recurrence relation with respect to the nonlinear multigrid operators B k is 
given in the following theorem. 

Theorem 1 The fundamental recurrence relation for the nonlinear multigrid operators B k) 
defined by Algorithm 1, is 

I - DB k (f k )Dg k (ut ) = DSp(u* k ){I -[/-(/- DB k . 1 {f k ^ 1 )Dg k . 1 {ii k . 1 )Y] (19) 

Dgk^u^r'Dgk^iut^Pk-^DSrK), 

where k = 1, 2, • • • , l, and u* k is a solution of g k (u k ) = f k on M k . 

Proof. Using (14), we immediately get the following equality: 

u„ + \B k (ft + s k {f h - g k (u k )}) - u k ] /s k = ST (ST(«i) + 9,< ‘ Ut) ~ U - ),V«k € A*. (20) 

L $k — 1 
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(21) 


The expression (13) of fk~i(u) follows 

fk-i( u t) = /*- 1- 

Then, by the induction and (18), we can show that 

q*(vl) - tijb-i, for i ~ 0, 1, 2, • • • ,p. (22) 

Thus, differentiating with respect to Uk at u k on both sides of the equality (20), and using (22), 
we get 

I - DB k (fk)Dg k (ut) = DSp(ut)[DSrK) + D qp (u* k )/s k ^}. (23) 

Here the operations are based on the calculus in Hilbert space [1]. 

Using (21) and (22), we see that 

D qi (u* k ) = [/- DBk-x(h-i)Dgk^{uk~i)}D qi ^(ul) + DB k ^(fk^)Df k -i(ut). 

In addition, with (13) and (15), 

Dfk. iK) = d»K)0STK) = (24) 

Hence, 

Dq P (u* k ) = {/ + [/- DBk-xifk^Dgk-xiuk-i)} + ■ ■ ■ (25) 

+ [/- ^5^ 1 (/ fe . 1 )Z)^_i(u fc -i)r 1 }PH,_ 1 (A_ 1 )Z?/ jt _ l (un 

= [/-(/- iD^„i(A_ 1 )Z?^l(«A:-_l)) P ] J D^_l(f2 A: _l)'- :l J C>/ fc _ 1 (^) 

= -«*_![/-(/ - i?^*„ 1 (A- l )^*_ 1 (fifc- 1 )) p ]i?flrA:-.i(wjb-.i)“ l /?flrjfe-i(Wfc_i)i s 5fe-xZ>5r(«*)* 

Therefore, the equality (19) follows by substituting (25) into (23). D 

The schemes (16) with p = 1 and 2 are often used in practice. We refer to them as the 
V-cycle and the W-cycle methods, respectively. In this paper, we only consider the convergence 
of the nonlinear V-cycle method. The discussion of the other cases is similar. 

Setting p = 1 in (19), we immediately get a fundamental recursion relation of the V-cycle : 

I - DBkCfk)Dg k {ul) 

- D5rK)[/~^ t - 1 (A-i)%-iK-i)ft- i]^r‘K). (26) 

From the definition of b k (-, •), it follows that the inequality b h (u, u) < b k -i{u, u) may not hold 
for some u € Mk-i- Thus, operator I - DB k (fk)Dg k {u k ) may be negative with respect to the 
inner product b k (-, ■)■ To show the convergence of the V-cycle, it is sufficient to prove that there 
exists a constant r) k in [0, 1), independent of h k , such that 


|6j([/ - DB k (h)Dg k (u' k ) Ru)| < : “). 

Vu € Mk, 

(27) 

The following two basic assumptions are made to show (27): 



M(/-ft-i)«,«)| < c»( l|Pgt (y' “»* )%(u,u)' 

-e, Vu € M k , 

(28) 

< Cshdl - DSl(ut)} u,u), 
Afc 

Vu € M k , 

(29) 
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where \ k is the largest eigenvalue of Dgk{u* k ), and 0 < (3 < 1. (28) and (29) are referred to as 
“the regularity and approximation assumption” and “the smoothing assumption”, respectively. 
The following theorem provides an estimation for a value of the parameter p k - 

Theorem 2 Let B k be defined by Algorithm 1 with p = 1 and mi = m 2 = rn. Assume that 

a) Assumptions (28) and (29) hold. 

b) The smoothing process S™ is formed by m steps of the nonlinear relaxation method Sk, 
such that DSk{u%) is symmetric and non-negative with respect to inner product b k {-, •), and 

DS?(ul) = [DS*K)P- 


c) The auxiliary vector ttj =t u\. 

Then there exist two constants, independent ofh k , 


such that 


9k, 1 


M(k) 

+ M(k ) 


and pk , 2 = 1 



cjcS \ “ 

( 2 m)^ J ’ 


rf k ' % bk(u,u) < b k ([I ~ DBk{fk)Dgk{u* k )]u,u) < p k ,ibk(u,u), Vu € M k . (30) 

Furthermore, if m is sufficiently large, then the estimate (27) holds with 

p h - max{|77 fei i|, |^ (3 |} < 1. 

Here M(k ) is a positive constant related to Cp,Cs,rn,(3 and k. Its detail expression can be 
found in Theorem 1 of [5], 

Proof With bk(DS'k‘(uk)u,v) = b k (u, D Sjf (u* k )v) , (26) and the definition of P k -\, we have 
&*,([/-. DBk(f k )Dgk(ul)}u,u) = b k ((I-Pk~i)DS?(vt)u,DS?(ut)u) 

We now show (30) by induction on k. For k = 1, we have B\ == gf l and u\ = u^. Thus, 

\bi([I - DBiif^DgiiuDj^u)] = 0. 

Suppose (30) holds for k — 1. We first prove the right hand side of (30). By induction, 

b k ({I - DB k {fk)Dgk(u*k)]u,u) 

< b k ((I - P k -i)DS™(u*k)u, DSr(ut)u) + pk-i,ibk-i(Pk~iDS?(u* k )u, H,flS?(«» 

= b k ((I - P k ^)DS?(ut)u,DS?(ut)u) + pk„ h MPk-iDSZ(ul)u,DS?(ul)u) 

= (1 - Vk~i,i)b k ((I - Pk-i)DSZ(ul)u, j?S?(u;)u) + pk^MDST(u* k )u, DSk‘(u* k )u). 


m 



By (28), (29) and the generalized arithmetic mean inequality, 

b k ((I - P^DSZiuDu, DS™(u* k )u) 

s' /~i2(\\^9k( L u )DS k (u k )u\\ k .0^ . npm/ *\ \1 -j3 

< r fb k {Vb k [u k )u,DS k (u k )u) p 


< C%(3r k 


\\Dg k (ul)DS?(ut)u\\l 


+ (1 -P)r~ k ^b k (DSrK)u,DSr(ul)u)) 


< Cl\!3r k Csh((I - DS k (u' k ))DS^(ul)u t u) + (1 - fj)r~ k “» b k (DS£(ul)u, DS?(ul)u)} 

< c l\Pr " DSl m (u’ k ))u, u) + (1 - py k ^h(DSZ(ul)n, £>SZ»)]. 

Combining the above inequalities gives 

&*([/- DB k (f k )Dg k (ul)]u,u) 

< [(1 )C}(1 + 

+(1 - ’l k -i.i)C 2 f C s ^r k b k ([I - DS 2 r(u- h )}u,u). 

Now, with the same proof as that in the proof of Theorem 1 of [5], we have that 

(1 — Vk-i,i)Cp(l — (3)r k 1 ’"' 5 + ?7fc— i,i < 


(1 - r] k - 1A )CpCs^r k < r) k<1 . 

This completes the proof of the right hand side of (30). 

We next prove the left hand side of (30). From the spectral properties of DS k (ul), it follows 

b k (DS?(ut)u,DS?(ut)u)<b k (u,u), A: = 1,2, •••,/. (31) 

Combining (31) and assumptions (28) and (29) gives 

-b k ((i- 

C 2 C P B 

< ^^((/-DSrK))-,”)]' h(DS^(ul)u,DST(ul)uY-' 1 


r'lnP 


[b k (u,u) - b k (DS?tu* k )u,DS?(ut)u)Y b k (u,u) 1 ^ < 


1-/3 / 


r<2r<P 


— (2 m y 1 K J K k v ’ k ' /J v *v u, > ~ (2m) 13 

where we have used the following inequality (which is similar to (3.16) in [5]): 

b k ([I - DS k [ul))DS^(ul)]u,u) < ±b k ({I - DSl m (ul)]u,u). 


b k (u,u), 


Let r k = ^1 + j f—p j • By the induction assumption, we have 

h-idl - DB k -i (f k -! )Dg k ^.i(u* k _ 1 )]u,u) > (1 - T^bk-^u, u), 
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which can be^wfitten as 


-b k -i([I - DBk-iUk-^Dgk-x^ul^lu.u) < -g k -\, 2 bk-\(u,u). 


Then, from the? above inequalities, we obtain 

-b k ([I - DB k (fk)Dg k (u k )}u,u) 

= -h((I - P t .i)DS?{ul)u, DS?(ul)u) 

< -kid - p^ds^uDu, DsrK)») - ^-1 Atn-iBsrKK DST(ui)u) 

= - Pt-i)Dsr(K)u, DS?(ut)u) - ds?KM 

< 

The proof of the left hand side of (30) is completed. □ 

With Theorem 2, we now can obtain a convergence theorem of the nonlinear V-cycle. 

Theorem 3 Let {«•£} be a sequence of iterative values of the nonlinear multigrid V-cycle algo- 
rithm, and let u* k be a solution of equation g k (u) = f k - If the assumptions in Theorem 2 hold, 
and m is sufficiently large, then there exists a constant a k with 0 < a k < 1, independent of grid 
size hk, and a neighborhood 0(u* k ,e k ) of u* k , such that all u k G 0(u* k ,tk), 


(2m) 13 


+ T k -i - 1 b k (u, u ) = (r k - 1 )b k (u, u) = -g k , 2 h(u, u ). 


\\ui +1 - ut\\b,k < crk\\u 3 k - u* k \\ b ,k j = 0,1,2, •••, 

when the initial guess u° k G 0(u* k ,e k ). Here || • the induced norm from b k (-,-), is defined by 
\\ u \\b,k =b k (u,u). 


Proof. Clearly, from Theorem 2 it follows that 


\\I — DBk(fk)Dgk(u* k )\\b,k = sup 

u 


\b k ([I — DB k (fk)Dgk(u k )]u,u)\ < 

b k (u,u) - nk ' 


For a^given positive number S k satisfying a k = 8 k + rjk < 1 , the differentiability of ip k at u k gives 
that there exists a neighborhood of u * k , 0(u* k , t k ) = {u k : \\u k — u* k \\ bi k < efc}, such that 


|| ipk(uk) ~ $k(u* k ) - Dif k (u* k )(u k - u* k )\\ b ,k < S k \\u k - *4|| 6 ,fc, 

where u k G 0(u* k ,e k ), e k is a positive number, and ip k is defined in (17). Thus 
|| ipk(uk) - u*k\\b,k = || ^fc(wjfe) “ i>k(u k )\\b,k 

< II ifk(uk) - k(u k ) - D^k(u k )(u k - u* k )\\ bi k + II D^k(u* k )(u k - u*)||m, 

5; ($k + ||-^V’A;(Wfc)|i6,fc)||ufc ~ 'Mfcll&.fc 5: — Uk\\b,k- 

Hence, by induction, for any u° k G 0(u* k , e k ), we can easily show that uj, G 0(u* k , e k ), and 

\\K +1 ~ K\\b,k < <Tk\\u k ~ U t\\h,k 3 = 0, 1,2, . 

□ 
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In a nonlinear multigrid algorithm, the following equations have been used on M k for k < l: 

9k(v ) = fk + s- k lf k ~ gk{u 3 k)], (32) 

and 

gkiy^ — fk -t~ &kQk\.fk+ 1 (^i)], (33) 

where u 3 k is the j - th iterate of the nonlinear multigrid method, and v\ is the iterative value after 
the pre-smoothing step of the nonlinear multigrid algorithm. Hence, to ensure that a nonlinear 
multigrid algorithm is well-defined, we should show that the solution of either (32) or (33) lies 
in the neighborhood 0{u * k , e k ) given in Theorem 3. 


Theorem 4 Let 0(u k ,€k) be a neighborhood of u* k . Assume that 

(a) There exists a constant C such that for all u € M k ||T>p^ 1 (tt)||6 i jt < C. 

(b) The auxiliary vector u k satisfies u k € 0(u k , C/t/2). 

(c) The auxiliary value s, k satisfies s k < — when r ^ 0, otherwise, s k = 0. Here 


r = max{||/fc - g k (u 3 k ) || 4l *, \\Q k [fk+i - (^i )] ||&,fc} 


, and Vi is the iterative value after the pre-smoothing. 

Then, the solution of either (32) or (33) lies in the neighborhood Q(u k , e k ). 


Proof. We only show that the solution of (32) lies in 0(u k , e k ). The proof for (33) is similar. 
Set r k = fk- gk(u 3 k ), and w = g^ifk + a*r*). If r k = 0, then w = u k € 0(u* k , e k ). If r k ^ 0, 
with assumptions (a) to (c), we have 

ik-^fclkfc ~ Wdk^fk + s kr k ) - *4 Ik* 

< Wdk^fk + skn) ~ u k \\b,k + - «*|k* 

= hkHlk + s k rk ) - gk l {fk)\\b,k + II Uk - ^||6,fe 
S ^k || h^9k ( u )|k*ll r fc|kfc T ~~ U k lk fc 

% £*C||r*|k* -f ||u* — UfclJft.ib < Cjfe/2 + tk ! 2 = Ck, 


i.e. w 6 0(u%, e k ). We complete the proof of Theorem 4. 0 

AN APPLICATION . 

In this section, as an application of the theory in Section 3, we consider the convergence of the 
nonlinear V-cycle for solving the second order elliptic, mildly nonlinear boundary value problem 

f - v(<*V«) + £(x,u) =./(*)». in ft, 

( u = 0, on dft, ^ ' 

where ft is a bounded, Lipschitz, polyhedral domain in R d , a € W 1,0 °(ft), a > C a > 0 a.e. on 
ft, and f G T 2 (ft). 
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Let D 2 B denote the derivative of £?(■,•) with respect to the second variable. We make the 
following assumptions on D 2 B in this section. 

A2) D 2 B(x,u) is continuous in Cl x R, and there exist constants C\ and C 2 such that 

0<C 2 < D 2 B(x,u ) < Ci. 

A3) D 2 B(x,u) satisfies a Lipschitz condition: there exists a constant L, independent of u 
and v, such that 

| D 2 B(x, u ) — D 2 B(x, u)| < L\u - u|, (35) 

for all (x,u), (a:,u) on a subset of Cl x R. 

Let H = Hq (Cl) be the Sobolev space [2]. The weak form of (34) is thus: Find u € H , such 
that 

a(u, v ) = (/, v ) L 2 , Vu e H (36) 

where 

a(u,v)^ / + and (f,v ) L 2 = / f(x)v(x)dx. (37) 

Let Mb be a set of piecewise linear functions with respect to a quasi-uniform triangulation 
Tb on Cl of size hb in the usual sense [8], We assume that there is a constant c , independent of 
k, such that hb-i < chb, and these triangulations should be nested in the sense that any triangle 
in Tb-\ can be written as a union of triangles of Tb- 

The finite element discretization for (36) on each Mb is as follows: Find Ub € Mb such that 

a(u k ,v) = (f,v) L 2 , \/v G Mb , (38) 


where k — 1, 2, • • • , /. 

Based on Theorem 39.12 in [16], we assume that 

A4) Equations (36) and (38) have unique solutions u* and Ub, respectively. For u* € H 1+/3 (Cl) 
with /3 € (0, 1], there exists a constant c, independent of hb, such that 

\\u* - u k \\x < ch p k , ( 39 ) 

where k === 1,2, • • • ,1, and || • ||i is the usual norm in Sobolev space H 1 [2]. 

We solve equation (38) by the nonlinear multigrid V-cycle scheme with the smoother S™ 
defined by m steps of the damped- Jacobi-Newton iteration. To prove its convergence, using 
Theorem 3, we only need to verify Assumptions (29) and (28). 

We first prove Assumption (29) for the smoother S™ below. 

Let be a natural nodal basis for Mb, where n k = dimM k - Apparently, we may 

consider the following equation on Mb'- For f k € M k , find u k € M k such that 

(^fe(Wfc), <fiv)b == (/fe) ^Pu)bi v = 1) 2, • • • , n k , 

with g k being defined by 

{g k {ub),v)h = a(u k ,v)-(f,v) L 2 , Vu € M k - (40) 
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Let u k be the j-th iterate of the damped- Jacobi-Newton iteration using a damping parameter 
9, expressed as follows: 

u\ +1 = U* + Rk(u{)[fk - 9k («*)] , 
where the linear operator R k (u) : Mk — >■ Mk is defined by 

R k {u)v = 9^(^~^(pi,cpi\ {v,(pi)k<pi Vv 6 M k . 
i = 1 V V U k,i ) k 

Since Sl(u) =u + Rk{u)(f k - gk{u)), and 

DSl(ut) = I - R t (ui)Dg k (u- t ), (41) 


we have 

DS ? (<) = [/ - Rk(ut)Dg k (ul)r = [051K)]”. 

Clearly, DS\(u * k ) is symmetric, so Assumption b) of Theorem 2 holds. From (41) we see that the 
Jacobi-Newton iteration has a similar form as the damped-Jacobi method in [17]. Therefore, 
using the same argument as in [17], we can show that Assumption (29) is satisfied by the 
damped- Jacobi-Newton iteration. 

We next verify Assumption (28). Let g be defined by 

{g(u),v) = a(u,v) - (f,v) L 2 , VveJ T. (42) 


It is easy to show that Dg(w), defined by 


(Dg(w)u,v) = / [a u v + D 2 B(x,w)uv]dx, 

J O 


Vue tf, 


is symmetric, positive definite on H. 

Hence, from (40) it follows that Dgk(w) is a symmetric, positive definite operator on Mk- 
Thus, the bilinear form on Mk x Mk 


b k (u,v) = (Dg k {w)u,v)k, Vu,v e M k , (43) 

is symmetric, positive definite. 

For simplicity, we let A k = Dgk(u* k ), and define a family of norms as follows: 

IIMIIr ,k= ( A k V ,v)k, VueMfc, 

where r is a positive nu m ber. In addition, we note that ||v|||ojfc is equivalent to ||w|]l 2 and 

We now can show that Assumption (28) holds in the following theorem. The proof of this 
theorem can be found in [15]. 


Theorem 5 Let M k be the space of continuous piecewise linear functions with respect to a 
quasi-uniform triangulation, and let u k be the solution of equation g k {u ) = f k in M k . Assume 
that (Al) to (A4) hold, and that the solution U of the variational problem 

b k (U,v) = {F,v) L 2 , VveH (44) 
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Figure 1: A comparison of a nonlinear V-cycle 
and a linear V-cycle. Here • • • : the linear V- 
cycle method for solving (46) with 6 = 0, +++: 
the nonlinear V-cycle method for solving (46) 
with a = 6 = 1, and h = yU. 


Figure 2: Dependency of the convergence rate 
of the nonlinear V-cycle on the auxiliary vector. 
Here + + + : u k = 0, — : u k = <?*«£+ i, 


- : u k = Sl 00 ( 0), • • 
and a = 6 = 1 in (46). 


u k = 0.5, h 


l 

128 ’ 


is in H 1+ ^(0) for some (3 € (0, 1], and satisfies 

\\U\\ h ^<C\\F\\ h ^ (45) 

for some positive constant C, independent of F. Then, there exists a constant C such that 

13 

l?\ 2 


I h((i - p t _ < c j b k („ lU ) 


Vue M*. 


where X k is the largest eigenvalue of Dg k (ul). 


NUMERICAL EXPERIMENTS 


In this section, we present numerical experiments with the nonlinear multigrid method for 
solving the following model problem [10]: 


■{u xx + u yy ) + 6 sinh (<zu) 

u 


= / in O = (0, 1) x (0, 1), 
= 0 on <90, 


(46) 


where a and 6 are positive numbers. The right hand side term / of (46) is chosen such that 
u = sin nx sin ny is the solution. 

The discretization equation of (46) is defined by the five-point stencil with h k = l/2 k (1 < 
k < l). The smoothing process S™ consists of m steps of the Gauss-Seidel-Newton iteration. 
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Figure 3: The relation of the relative residual of the nonlinear V-cycle with parameter Sk at the 
12 th V-cycle iteration. This figure shows that as Sk is around 1, the nonlinear V-cycle has an 
almost same convergent rate. Here h = and a — b — 1 in (46). 


We set m% — rri 2 = m for all grid levels and the coarsest grid size hi = - for all of our numerical 
examples. Besides, the full-weighting restriction operator Qk, [9], was used, and only one step of 
the Gauss-Seidel-Newton iteration was applied to get the solution of the equation on the coarsest 
grid Mi. The initial guess u° h — 0 and the relative residual stopping criterion were taken for 
all the numerical experiments, which were implemented on a KSR1 supercomputer with single 
precision, which is equal to the regular double precision. 

We compared the performance of the nonlinear V-cycle with the linear V-cycle method. The 
linear V-cycle case was obtained from the nonlinear V-cycle program by setting b ~ 0 in (46). 
Thus, a Poisson equation was solved by the linear V-cycle method. From Figure 1 we see that 
the nonlinear multigrid method is as efficient as the linear multigrid method. We checked the 
dependency of the convergence rate of the nonlinear multigrid method on its two parameters u 
and Sk- We used three different values of Uk in the experiments. 

1) Uk — 0 on all grid levels; 

2) Uk — S™( 0), i.e. Uk is defined by m steps of the Gauss-Seidel-Newton iteration with zero 
initial guess. Clearly, by increasing m, we can make Uk approach to the exact solution gk{u) = /*. 
as closely as desired. 

3) u-k — QkUk+ i, where denotes the iterative value after the pre-smoothing step of the 
V-cycle. We call this type of u,k Brandt’s choice because it was first used by Brandt in [7]. 
Figure 2 shows that if Uk is properly close to the solution of g k = fk, the convergence rate of 
the V-cycle will be almost the same. Otherwise, the nonlinear V-cycle may be divergent. For 
example, from this figure we see that the V-cycle with Uk = 0.5 was divergent. 

For fixed u,k — 0, we also made experiments with different values of Sk- Figure 3 shows that 
it is satisfactory to let Sk be around 1. 

Finally, we checked the influence of the a and b in (46) on the convergence of the nonlinear 
V-cycle method. The numerical results are reported in Tables 1 to 3. Here we used four different 
Uk, h = and mi = m 2 — 1 for all of these numerical experiments. We also used a — 1.0, 
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b = 1.0 and a = 3.0 in Table 1, Table 2 and Table 3, respectively. The notation — in the 

tables means that the V-cycle is divergent. From these tables we see that: 1) When 0 < a < 3 

and 0 <, b < 10, u k = 0 is the simplest choice; 2) Brandt’s choice worked for 0 < a < 6 and 

0 < b < 100; and 3) the nonlinear V-cycle with u k = S™(0) using large m can lead to conver- 

gence for a pair of a and b for which the nonlinear V-cycle with Brandt’s choice is divergent. 

Table 1 -.The performance of the nonlinear V- cycle as the b in (46) becomes larger. 



The Total number of Iterations 

b 

Uk = 0 

Uk — Qk u k + i 

u k = Sl(d) 

= s;°( o) 

10 

13 

14 

13 

14 

30 

40 

13 

14 

13 

100 

— 

12 

35 

13 


Table 2: The performance of the nonlinear V - cycle as the a in (46) becomes larger. 


a 

The number of Iterations 

Uk - 0 

uk = Qkrfkl i 

a* = sm 

u k = Sf(0) 

0.001 

14 

14 

14 

14 

2.0 

13 

14 

14 

14 

3.0 

32 

14 

14 

15 

6.0 

— 

12 

— 

30 

7.0 

— 

— 

— 

20 


Table 3: The performance of the nonlinear V-cycle for solving (46) with large a and b. 


b 

The number of Iterations 

u-k = 0 

Uk — Qk u k + i 

u k = sm 

% = SH o) 

0.01 

14 

14 

14 

14 

1.0 

32 

14 

14 

15 

20.0 

•— 

12 

— 

16 
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SUMMARY 

A highly accurate and efficient numerical method is developed for modeling 3-D reacting 
flows with detailed chemistry. A contravariant velocity-based governing system is developed for 
general curvilinear coordinates to maintain simplicity of the continuity equation and compactness 
of the discretization stencil. A fully-implicit backward Euler technique and a third-order monotone 
upwind-biased scheme on a staggered grid are used for the respective temporal and spatial terms. 
An efficient semi-coarsening multigrid method based on line-distributive relaxation is used as the 
flow solver. The species equations are solved in a fully coupled way and the chemical reaction source 
terms are treated implicitly. Example results are shown for a 3-D gas turbine combustor with strong 
swirling inflows. 


INTRODUCTION 

Combustion simulation generally requires the solution of the coupled equations of mass, momen- 
tum, species balance and energy with detailed thermodynamic and transport relations and finite-rate 
chemistry. In order to alleviate the strong interaction between the flow and combustion, and to avoid 
solving this huge system at the same time, the governing equations are usually solved in a semi- 
coupled way that the chemical reaction part and fluid flow part are treated separately. For the flow 
part, the mass, momentum and energy equations can be solved by using the existing CFD code; 
therefore, most efforts towards modeling combustion are concentrated on the reaction part. Many 
progresses have been made in solving the chemical species equations [1-8] . 

It is well realized that the reaction part, that involves multi-species, multi-step, finite rate kinetics, 
is a sensitive and stiff system, and it takes most of CPU time in most computations. Most of the 
successful combustion simulations are based on the coupled solution of chemical reaction system. 
There has not been found a general efficient way to decouple the system and reduce the cost in each 
iteration. Therefore the most effective approach is to reduce the iteration number. Since the flow 
field acts as the carrier of chemical reaction, it can be anticipated that a fast established flow field 
will provide a stable base for the reactions and therefore make the species equations easy to converge. 
As shown in our previous work [9,7,8], very efficient CFD methods will greatly reduce the iteration 
numbers of the reaction part which is very costly. Furthermore, for practical 3-D combustion, the 
flow field may be very complex, then the flow part could take considerable portion of the total CPU 
time. Therefore, the development of very efficient CFD methods and reaction modeling method is 
equally important in combustion simulations. 

This paper describes a very accurate and efficient numerical method we have developed for 
calculating general 3-D reacting flows with detailed chemistry. The principal focus is put on the 
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development of a high efficient solution method and high accurate scheme for chemical species 
transport equations. Based on the finite volume frame, an implicit method is developed to solve 
the 3-D Navier-Stokes equations and chemical species transport equations in general curvilinear 
coordinates. A distinctive feature of this method is that the contravariant velocities are employed 
as the dependent variables. The momentum equations of contravariant velocities are discretized in 
staggered control volumes while the energy equations and species equations are integrated basically 
by using a cell-centered finite volume scheme. In this way, the discretized mass equation remains its 
simple form as in the Cartesian grids and the stencil is spatially the most compact. A third-order 
monotone upwind-biased scheme by van Leer [10,11] is used for all the convection terms of flow 
equations and species equations to minimize numerical diffusion and maintain the sharp gradients 
present in flames. 

This method was tested by applying to calculate the strong swirling combustion in a 3-D gas 
turbine combustor. For a 49x65x65 grid of 207,025 grid points, the calculation takes only about 200 
time steps and 21.3 CRAY-YMP hours to reduce residuals by more than three orders of magnitude 
for all governing equations. 


GOVERNING EQUATIONS 


The governing equations for general compressible reacting flows in integration form can be sum- 
marized as follows. 


Mass conservation: 


Momentum conservation: 


f ^-dQ, + f pq- fids = 0 
Jo ot Jr 


J ^dfi + j pq[q-n)ds = J f n ds 


( 1 ) 


( 2 ) 


In low speed combustion, the kinetic energy is negligible comparing with enthalpy; therefore, the 
energy conservation can be simplified as [12]: 


f ^ — —d£l+ f ph(q-fi)ds = f ■ qds + [ X h(Vh-fi)ds 
Ja ot Jr Jr Jr 


Chemical species equation: 


f *¥^-dn + [ pY a (q-n)ds= f X Y (VY a -n)ds + f R a dQ 
Ja ot Jr Jr Ja 

a = 1, 2, • • • ,NS , 


( 3 ) 


( 4 ) 


Enthalpy and state equations: 

h = h(Y a ,T), P^J2~PRT. 


( 5 ) 


where t is time, 0 is a fixed control volume with boundary T, p is density, p is pressure, q is 
velocity vector, T is the temperature, h is the enthalpy, n is the unit outer normal vector of the 
boundary, r n is the total viscous stress acted on a surface with outer normal vector n, and R a is 
the chemical reaction rate of species a. R, Y a , and W a are the gas constant, the mass fraction and 
molecular weight of species a, respectively, and the specific enthalpy and species diffusion coefficients 
are determined from 

A - ( BL + JJI- 
K Pr L ' Prr ) ’ Y \Sc L Sc T , 


= I TT~ + W 
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where \ic is molecular viscosity, ftp the turbulent viscosity determined from turbulence model, Prz, 
and Prp are the laminar and turbulent Prandtl numbers, and Scl and Sct are the laminar and 
turbulent Schmidt numbers, respectively. From the constitutive relations, we have: 

M = -(p+f/*V-g)[J] + 2/i[e] (6) 


£ 




dqi dqj 
dxj dxi 


/* = y-c + PT 


(7) 

( 8 ) 


The enthalpy h and molecular viscosity ji can be calculated by the following formulas: 


h = JTYafla, 

a 


h a 

rp fp 

= f Cp a dT a = ho a + f Cp a dT a , 
Jo JTq 


c Pa 

= C°p a + C l P T + C 2 p o T 2 + Cp a T 3 + C 4 p a T 4 , 


t*c 

— ^ ] Yaflg, 


Va 

= A«“+A‘iT + /i»T 2 +A*|T 3 +^T 4 . 

(9) 


where ho a is the standard formation enthalpy of ath species, C Pa , C Pa , • • • , C Pa , n° a ,fx l a , ■ ■ ■ are 
polynomial coefficients for Cp a and fi a , respectively. 

All thermal and transport parameters are obtained by linking with CHEMKIN-II [13] standard 
libraries. 


CHEMICAL REACTION MODEL 

For laminar flames, the chemical reaction rate R a for the ath species can be calculated by 



where uj a is the molecular weight of species a, Nr is the total number of reaction steps, Ns is the 
total number of species, vf a (vj^) refers to the stoichiometric coefficient of products (reactants), and 
n, = 

The function Kj (Kj ) is the rate constant for the forward (backward) reaction step j. We 
assume Kj has the following Arrhenius temperature dependent form: 

K l ~ A j T P ex P(~^)’ ( n ) 

and Kj has a similar expression. The reverse rate constant can be written in terms of the forward 
rate constant and the equilibrium constant Kj as 

K) = Kj IK). (12) 

Here, K) are also obtained by calling CHEMKIN-II. The pre-exponential factor Aj , the temperature 
exponent a), and the activation energy Ej can be compiled from published experimental work. 

For turbulent reacting flows, the Algebraic Correlation Closure(ACC) model is used to introduce 
a correction term to the reaction rate [7,8]. 
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CONTRAVARIANT VELOCITIES AND STAGGERED GRID 


One may think use of contravariant velocity on staggered grid will result messy governing equa- 
tions and cause great difficulties in coding. However, that is not always true. Following are the 
reasons why we choose it to solve reacting flows on arbitrary grid: 

® Using staggered grid can result more accurate and robust schemes as concluded by numerical 
analysis and confirmed in previous calculations on regular Cartesian grids. 

• On general curvilinear grids, staggered grid method can be made of best use by combining with 
contravariant velocities. For each contravariant velocity component, the discretization stencil 
for its main direction pressure gradient is spatially the most compact, therefore eliminating 
the possibility of odd-even decoupling of pressure. 

• The use of the contravariant velocity also benefits the solution of mass, energy and chemical 
species equations. The flow convection can be accurately represented. 

• With use of proper discretization method and careful selection of definition locations of vari- 
ables, the governing equations can be kept simple enough for the momentum equations, and 
even simpler for all scalar conservation equations. 

• Most importantly, this method will retain the close relation between mass flux and pressure 
difference on curvilinear grids. Therefore the pressure-correction method can be used very 
efficiently. This feature yields a fast convergence rate on curvilinear grids which is similar to 
that on Cartesian grids. 

Let (u,v,w) be the velocity components in Cartesian coordinates (x,y,z), and (U , V, W) be the 
contravariant velocity under computational coordinates (£, 77 , £); their relations can be described as: 

U = J(u£ x + v£ y + w$ z ) 'j 

V — j(ur} x -(- vrjy + WT] Z ) > (13) 

W - 3 (u£ x + v£y +wC) J 

where J is the transformation Jacobian from (x,y,z) to (£, r), Q. 

From the above relations, the velocity components in x,y,z direction can be found: 

' u 1 H \ Jt* Jty Hz I ” 1 

v =A V , A= J rj x Jrjy 3 t) z (14) 

w j [_ W J L 

Equation 14 will be frequently used hereafter; for simplicity it is denoted as: 

qi = a, m U m (15) 

where [gi, q 2 , qz] T = [u, v, w] T and [U 1 , U 2 , U 3 ] T - [U, V, W] T . 

In this work, the basic scheme is the finite volume method. The computational domain is dis- 
cretized into a number of quadrilateral cells in two dimensions or hexahedral cells in three dimensions. 
As in Fig. 1, 1-2-3-4-5-6-7-8 forms a typical cell in three dimensional problems. In finite volume 
formulation, the contravariant velocities can be expressed as: 

U J+ %,j,k ~ (? • 55678),:+ ij.fc 

+ (9 ‘ 52376 ) 1 ^ + ^^ (1®) 

W.j,fc + i = (q- S 3 487) itjik + L 

where subscripts i,j,k denote the cell index in each of the three curvilinear coordinate directions, 
respectively. In order to retain the merit of staggered grid, the contravariant velocities are defined 
at different locations as shown in Eqn. 16 and Fig. 1. 
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c 

Figure 1: Cell Locations in Three Dimensional Grid 


Generally the face vectors are denoted as following for clarity: 

s 1 =St =con . i . = (S 1 *,S 1 »,S u ) 

5 2 = Sr,=consi. = {S 2x , S 2y , S 2z ) 

5 3 — S^- cons t. = (S 3x , S 3y , S 3z ) 

In the finite volume frame, equation 13 and equation 14 are expressed as: 

U = uS lx + vS 1 * + wS u ) 

V = uS 2x + vS 2y + wS 2z \ 

W = uS 3x + vS 3y + wS 3z J 


- S U 

s ly 

S U 

S 2x 

S 2y 

S 2z 

$3x 

S 3y 

S 3z 


(17) 


(18) 


(19) 


In the actual computation, pU , pV , and pW are regarded as the dependent variables instead 
of U , V, and W, because they are conserved quantities and the resulting governing equations are 
relatively simple. Their definition locations are the same as those of U, V, and W. pU is defined at 
(*+!> J, k), pV is defined at + k), and pW is defined at (i, j, k+ \). All other variables, p, p, h, 

and Y a , are defined at. the cell centers. Only p, pU, pV, pW , h, and Y a are the dependent variables 
which are solved directly from the integral conservation equations (1-4). All other parameters are 
determined from the relations (5-10). 

The governing equations for contravariant velocities can be established through coordinate trans- 
formation, then their forms are indeed quite complicated. Actually we can find an easy way to 
obtain the equations by applying the momentum equation to certain control volumes. For example, 
the equation for pUi+Lj^ can be obtained by simply multiplying the Eqn. 2 with the face vector 

Sj + 1 . k , then applied to control volume Vol i+ ij k , which is formed by connecting £-line mid-points 
a-b-c-d-h-e-f-g as shown in Fig. 1 

/ m ^ ^ dQ + f p(q ■ S 1 ) (q ■ n) ds = f S 1 ■ f n ds (20) 

J Clfy JT fj 
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Notice S' 1 = Sj + 1 . k is a constant vector within the control volume fip = Vol i+ ij ih . In the above, 

all q will be eventually expressed in terms of U, V and W by using Eqns. (18,19). We prefer to do 
the transformation later in the succeeding sections, because it will be much easier to do that after 
discretization. 

The momentum equations for pV and pW can be obtained in the similar way by applying to 
Volij + i fc and Vol i j k+ 1 , respectively. 

All the other equations, i.e., mass conservation, energy conservation and species equations, are 
applied to control volume Volij^- They can be put in a general form: 

f + f p<f>(q ■ fi)ds = f A (V0-n)ds+ f Fds+ f Sds (21) 

Jn ot J r J r J T J n 

where <j> — [1, h, Y„] T , F = [0, r n • q, 0] T and S — [0, 0, R a ] T with a = 1, 2, • • • , NS. 

The above equations are not their final forms; the Cartesian velocity q is still used for simplicity. 
It will be replaced by contravariant velocity during the discretization process in the next two sections. 

STAGGERED FINITE VOLUME SCHEME 


In this section, we begin to discretize the governing equations described in the last section. 


Momentum Equations 


The p(7-equation (20) is applied to the staggered control volume Vol i+ 1 • j., which is discretized by 
using finite volume method as 


Vol, 






n+1 




At 


+ • (/>«)f](? ■ • S)i = Vis P V 


i=i 


( 22 ) 


where l is the cell surface index, ranges all the 6 cell surfaces of the control volume Vol i+ ij k . Vis p p 
is the total viscous stress component in S^i j J( . direction acted on the surface of control volume 
Voh, i f It will be described in the next section. 

Based on the idea of MUSCL scheme by Van Leer [10,11], a partially upwind-biased scheme is 
developed to approximate the momentum fluxes through cell surfaces. The basic idea is that the flux 
through the control volume surface is regarded as the product of the mass flow and the conserved 
quantity. According to the sign of mass flux, the conserved quantity is set to its upwind-side value. 
Thanks to the staggered scheme, the mass flux through the surfaces is always directly available. 
There are only two possible locations for all the control volume surfaces, either the surface lies along 
with one of the original grid surfaces or it runs through the original grid cell center. In the former 
case, the mass flux is already defined there. In the latter case, since the Cartesian velocity and density 
are defined at the cell center, the mass flow also can be found straightforwardly. Therefore only the 
conserved quantity at the surface is needed to be interpolated or obtained through reconstruction 
of data from the cell-averaged values like Van Leer’s MUSCL method. This feature ensures that the 
calculated flux is continuous when mass flow changes sign. For example, if the flux ( F ) through a 
control volume surface ( S ) in i direction is consisted of mass flow (M) and the conserved quantity 
(V>), then 

Fi = (M-§)i fa = (M ■ S)f ^ ( -)+(M-S0r V>i(+) (23) 


In the above, the superscripts 
variable, respectively, 


+,- on a variable denote the positive and negative part of the 


M + = max(M, 0), M~ = min(M, 0) 


(24) 



and the superscripts (+),(-) on an index indicate that the variable is taking the limit value on the 
interface from the left or the right, respectively. For instance, in i direction we have: 


V’i(-) = lim^>i, V’it+i = lim ipi (25) 

( — ► t l —n 

High-resolution schemes up to third order can be constructed by setting 

</> 

V'i(-) = V’i — 1/2 H ]j"M(l — K )V + (1 + k)A J^i-i/2 (26) 

ip 

V’i(+) — ’/'» + 1/2 ^*M(1 + K )V + (1 — K)A]V’i+l/2 (27) 

where V and A are backward and forward difference operators, and k is a parameter used to control 
the order of the scheme, k = (1/3) is used in the present method to construct the third-order 

scheme. When k — — 1 the scheme reduces to the second-order fully upwind method. The limiter cr 

is adopted to ensure the monotone interpolation following Koren [14] as: 


$ _ 3Vi/>;_iAl/';_i + 9 

“ 2(Vtf,_i - A*/>,_ i) 2 + 3V^,_ jA^,_i + 8 


(28) 


where 9, a small constant with a typical value of 10 20 , is added to prevent division by zero. 

In our solution algorithm, only (pU) i+ 3 jk ,{pU)i_L j k , (pU) i+kJ+lik ,(pU) i+ Lj- hk , (pU) i+ L jk+1 
(pU)i+i t j ik -i, (pU) i+ Lj ik , Pi,j,k and Pi+i,j,k are treated implicitly for p(/-equation. In general, the 
p [/-equation can be expressed in 6 form as: 

A E 9{pU)i+ij, k + A w 6(pU)i_Lj' k +-A.NS(pU) i+ Lj +lih + As6(pU) i+ i ij ^ lik 
+ A F 6(pU) i+k<jk+1 + A B S(pU) i+ ij k _ i + A c S(j>U) i+ y <k 

+ A p L 6pij tk + A p R 6pi+ij : k — —Ru (29) 


where Ru denotes the residual of pU -equation, including convection and diffusion part. 
Similarly, the momentum equations of pV and pW can be found. 


Scalar Conservation Equations 


All the scalar conservation equations (21) are applied to control volume Volij ik with cell-centered 
finite volume scheme. The above-used upwind-biased scheme with limiter are used for the convection 
terms, second order compact central difference scheme for the diffusion terms. The only exception 
is the mass conservation equation, which benefits most from the staggered grid, the discretized 
equation has the simplest form and is the most compact in space in terms of contravariant flux 
velocity 


where 


- 6 (pU)i-y,k + S(pV)ij+i,t - HpVhj-i,k 

+ 6(pW ) iJik+ 1 - S(pW) iJ k _ i = —Rrriij'k 


Rm,j, k = Vol 


/> n+1 — n n 

Pj,j,k Pi,j,k 


+ (pU)^ Uk -(pU)l Uk 


A t 

+ {pv)h +k ,k ~ - (pw)? jik+i 


(30) 


(31) 


In our solution method the time-dependent term of mass equation is dropped for fast convergence. 
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All other equations are discussed here in their general form (21) except for the source term and 
the stress work term. The source terms of the species equations are usually dominant and of strong 
non-linearity. We will discuss the treatment of those source terms in the next sub-section. The stress 
work term in the energy equation will be discussed in the next section, since it has no contribution 
to the implicit coefficients. If we leave the implicit coefficients contributed by the source terms in 
the next sub-section, the discretized forms of Eqn.(21) are assumed to have the following form: 

®Ef><f>i+l,j,k + + $prWij+l,k + $sWi,j-l,k /on') 

+$Ff><Pi,j,k + 1 + = —Res(<j>) 

where Res(<j> ) is the residual of ^-equation. 

The convection term is discretized by using the same method described in last sub-section for 
convection terms of momentum equations. The diffusion term on the right side of Eqn.(21) is 
discretized through two steps. First we calculate the gradient V0 on the cell surface by applying 
Gauss’s formula to locally-formed staggered control volume, then assemble the integration. Since 
the gradients are computed locally, the resulted scheme reduces to a compact one when regular grid 
is used. 


Implicit Treatment of Reaction Source Term 


The major difficulty in calculation of finite rate combustion is the stiffness of the species equations. 
To solve this problem, the source terms (production rate of chemical reaction) must be treated 
implicitly. 

In the last subsection, the discretization of time dependent, convection and diffusion terms of the 
general scalar conservation equation is discussed. For the chemical species equations, the discretized 
equations can be written as: 


$E$Y ai+ljik + $wbY ai _ lik + $/v<5Y ai , i+ i ,* + ^sSY aiJ _ ljk 
+* F 6Y ai<j ' k+1 + + 4 > C 6Y aiijjk = - [C T (Y a ) n - D T (Y a ) n - R a } 


(33) 


where 6( ) = ( ) n+1 — ( )”, Ct is the convection term and Dt the diffusion term. R a is the reaction 
rate defined in Eqn.(10) 

Nr Ns Ns 

Ra = (34) 


ro= 1 


1=1 


1=1 


w, 


The reaction rate is usually very large and dominant near the flame front. Therefore, implicit 
treatment for the production rate term is necessary. Using Taylor expansion, we have 

K +1 = K + T t ^ 6Y m+J20(SY^). ^ (35) 


By defining 


R = (R 1 ,R 2 ,---,Rn s ) T , 6Y = (6 Y u SY 2 ,---,6Y Ns ) t , D 0 


dR a 

BY’ 


we may have 


R n+1 wR" + D5Y. 


(36) 

(37) 


where D is a Ns by Ns matrix. 

It is apparent that the implicit treatment of R a requires the coupled solution of the species 
equations. By denoting Res a = CtCY*)” — i>r(Y a ) n — R£, the residual of ath species equation and 
Res = (Res i, Res 2 , • • • , ResM s ) T , the residual vector, then Eqn.(33) becomes 


<I>£;I6Yj + i ] j ] fc + ^ivBYi-ij^ -f $ArI5Y,-j +li fc + ^sKYij-i^ 

-t-^FKYj^fc+i + + ($ c l + D)<5Yg y,* = -Res 
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where I is a unit matrix with the elements 


f 0 if 
\ 1 if/ = m 


(39) 


and $ is a scalar. 

Eqn.(38) is the final form of the species equations. They are solved in a coupled way. If line- 
relaxation is used along j-line and Gauss- Seidel iteration used in i, k directions, for instance, then 
the equation (38) is rewritten as 


+ (* C I + D)«Y?j» k + M8Y?p +1>k 

= -Res - SeMY??! j, h ~ M8Y^ k - 4 f HY^ i+1 - ^KY^.j 


(40) 


The left side of above equation forms a block-tridiagonal system, which can be solved by using 
the tailor-made algorithm combined with a Gauss Elimination method for the small block matrix 
inversion. 


VISCOUS STRESS 


Generally the viscous stress acted on a surface S — {S x , S y , S z } with outer normal n = § is defined 


tVj S — tj ;S X + <Sy + T Z S Z 

—■ i'ij'xxSx + 7~yx Sy + T ZX S Z ) + ji'T’xySx -(“ Tyy Sy T Z yS Z ) T k(^T x z S X “i" Ty Z Sy T T ZZ S Z ') (41) 

where i,j, k is the unit vector in x, y, z direction, respectively. 

The viscous force component in n u direction acted on S surface can be obtained by multiplication 
of the above equation with n u : 


(^n * Uu)^ 71u X (T XX S X T Ty X Sy T T ZX S ^) T Tl U y(^T X y S X T TyySy “i“ TzySz'j 

+ Tluz ( Tj;z S x Ty Z Sy T T ZZ S Z '^ 


From Eqn.(5), we have 
hm — 




< dx n 


_L du m 

' dx l 


) 


l ^ m 


-(p+jpV-q) + 2p^ l = 


B 


lm 

-p + | B mm 


l ^ m 
l = m 


(42) 


(43) 


where B^ k = p ijtk . 


i,j ,k 

In the finite volume formulation, velocity strain can be calculated as: 


du ! 


dx " 


ij } k 


Vol 


i,j,k 




j,k 


+ («'5' 2m )i J+ i, <: - («'5' 2m ) iJ _i, fc + (u'S ' 3 "*),. i+i - (u l S 3m ) i j 


ij.k- 


(44) 


Hereafter, all subscripts, except those indicating grid location, are placed upper-right like super- 
scripts to avoid confusing with cell index: 

After substituting the above equation into Eqn.(42), we have: 

8 h 


{T n -n u )S = n™T lm S ] = 1 


nfm elm. rri rd 

n u B S - 6 n u pS 


Im^m ^ rr/ 


(45) 


By introducing difference operator 

)i,j,k = ^l( )i,j,k — ( )»+I,j,fc “ ( )*— 

^»j( )*,j,k ~ ^z( — ( )j,j + ± t fc — ( )jj_I i J; 

S d )t,J,fc = M )i,j,k = ( )i,j,k + k ~ ( )i,j,jfe-l 


(46) 
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and using Eqn.(15), B im then can be expressed in terms of contravariant velocity as: 


= (&)„,, s - K s " m “' r + 


( 47 ) 


The viscous term in Eqn.( 22) can be obtained by applying the above equation to each surface 
of the control volume Vol i+ ^- k , after substituting n u in the above equation with S} + L . fc , 

Vis(U) — k + (5 2 f„ s2 ) i+ i J+ i j. - (S 2 f n ^) i+ ij_i >k 

+{S T „ t — (S -4.1 

= 

+ (i - -3-J + 8 2 (B'™S») i+hjtk + 6 3 (B>™S 3 ') i+Uk (48) 

Similarly, we can find the viscous terms in V- and W 7 -equations. 

In the energy equation, the viscous stress work is: 

J^T n qds = {(S ,1 f n , 1 •«)i + jj 1 fc-(5 1 f n . 1 •gViy.i +(S 2 f ns2 -q) itj+ i k 

~(S 2 r n , 2 ■ q)ij _ + (S 3 f „, 3 • q) iJth+i - ( S 3 T n , 3 ■ 

= 8 n {pq m S nm ) itLk 
( tf 171 \ 

+ (^1 - -3-J S 1 (B ,m q m S u ) iJik + 6 2 (B lm q m S 2, )i !jik + 6 3 (B lm q m S 31 ) iJtk (49) 


SOLUTION PROCEDURE 


To solve the governing equations discretized in foregoing sections, an implicit time-marching method 
has been developed. The governing equations are divided into two sets: the flow part and the 
chemical reaction part. They are solved alternately. Different solving techniques are applied to 
those two sets of equations. In the following, the numerical procedure is described in detail. 


Provision of Reaction Mechanism 


For a given combustion problem, the chemical reaction mechanism is needed to be prescribed besides 
the fuel, oxidizer and boundary conditions. The chemical reaction mechanism is usually obtained 
through experiment. In the numerical simulation, it is represented by the pre-exponential factor 
Aj , the temperature exponent aj and the activation energy Ej of the chemical reaction equations. 
Those parameters and reaction equations are specified through an input data file “mech” provided by 
users in our code. In our test case involved methane-air reaction, the Ci-chain reaction mechanism 
in Table I given by Xu [5] is adopted, in which 16 species are involved in 45 steps reaction chain. 

Thermal and transport parameters are obtained by calling CHEMKIN-II subroutines and data 
bases. 
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Table 1. Ci-Chain Methane- Air Reaction Mechanism. Rate coefficients: 

K = AT a exp(—jHr), units: moles, cubic centimeters, seconds, Kelvins, calories 


No. 

reaction 

A 

a 

E 

1 

ch 3 ah ^ ch 4 

1.90E+36 

-7. 

9050. 

2 

CH 4 + 02 ^ CH 3 + H0 2 

7.90E+13 

0. 

56000. 

3 

ch 4 ah ^ ch 3 + h 2 

2.20E+4 

3. 

8750. 

4 

CH 4 + O CH 3 A OH 

1.60E+6 

2.36 

7400. 

5 

CH 4 + OH ^ CH 3 A H 2 0 

1.60E+6 

2.1 

2460. 

6 

CH 2 0 + OH 5=4 HCO + h 2 o 

7.53E+12 

0. 

167. 

7 

CH 2 0 + H^ HCO A H 2 

3.31E+14 

0. 

10500. 

8 

CH 2 0 + M^ HCO + H + M 

3.31E+16 

0. 

81000. 

9 

CH 2 0 + 0^ HCO + OH 

1.81E+13 

0. 

3082. 

10 

HCO A OH ^ CO A H 2 0 

5.00E+12 

0. 

0. 

11 

HCO + M^H + CO + M 

1.60E+14 

0. 

14700. 

12 

HCO + H^CO + H 2 

4.00E+13 

0. 

0. 

13 

HCO AO^OH + CO 

1.00E+13 

0. 

0. 

14 

HCO + 0 2 ^ H0 2 -(- CO 

3.00E+12 

0. 

0. 

15 

CO AO AM C0 2 + M 

3.20E+13 

0. 

-4200. 

16 

CO A OH ^ C0 2 + H 

1.51E+7 

1.3 

-758. 

17 

CO A 0 2 C0 2 + 0 

1.60E+13 

0. 

41000. 

18 

ch 3 + o 2 ^ ch 3 o + 0 

7.00E+12 

0. 

25652. 

19 

ch 3 o am ^ ch 2 o ah am 

2.40E+13 

0. 

28812. 

20 

ch 3 o ah ^ ch 2 o a h 2 

2.00E+13 

0. 

0. 

21 

ch 3 o a oh ^ ch 2 o a h 2 o 

1.00E+13 

0. 

0. 

22 

CH 3 0 aO^ CH 2 0 + OH 

1.00E+13 

0. 

0. 

23 

ch 3 o ao 2 ^ ch 2 o + ho 2 

6.30E+10 

0. 

2600. 

24 

ch 3 ao 2 ^ ch 2 o a oh 

5.20E+13 

0. 

34574. 

25 

CH 3 aO^ CH 2 0 + H 

6.80E+13 

0. 

0. 

26 

ch 3 aoh^ ch 2 o a h 2 

7.50E+12 

0. 

0. 

27 

H0 2 A CO ^ C0 2 A OH 

5.80E+13 

0. 

22934. 

28 

H 2 + 02 ^ 2 OH 

1.70E+13 

0. 

47780. 

29 

OH A H 2 ^ H 2 0 + H 

1.17E+9 

1.3 

3626. 

30 

H A0 2 ^0H AO 

2.20E+14 

0. 

16800. 

31 

0 + H 2 ^ OH + H 

1.80E+10 

1. 

8826. 

32 

HA0 2 AM^ H0 2 A M a 

2.10E+18 

- 1 . 

0. 

33 

H 4* 02 4* 02 ^ HO 2 4- 02 

6.70E+19 

-1.42 

0. 

34 

H + 0 2 A N 2 ^ H0 2 + N 2 

6.70E+19 

-1.42 

0. 

35 

OH A H0 2 ^ H 2 0 A 0 2 

5.00E+13 

0. 

1000. 

36 

H + H0 2 ^ 2 OH 

2.50E+14 

0. 

1900. 

37 

0 + ho 2 ^o 2 aoh 

4.80E+13 

0. 

1000. 

38 

20 H ^OA H 2 0 

6.00E+8 

1.3 

0. 

39 

H 2 AM^HAHAM b 

2.23E+12 

0.5 

92600. 

40 

02 4* M ^ 0 4 0 4 M 

1.85E+11 

0.5 

95560. 

41 

HAOHAM^H 2 OAM 

7.50E+23 

-2.6 

0. 

42 

H + H0 2 ^H 2 a0 2 

2.50E+13 

0 . 

700. 

43 

ho 2 a ho 2 *=* h 2 o 2 a o 2 

2.00E+12 

0. 

0. 

44 

h 2 o 2 am ^oh a oh am 

1.30E+17 

0. 

45500. 

45 

h 2 o 2 aoh^ h 2 o a ho 2 

1.00E+13 

0. 

1800. 


Third body efficiency with respect to Ar: 
a H 2 0 = 21, H 2 = 3.3, CO = 2.0, C0 2 = 5.0, N 2 = 0 2 = 0. 
b H 2 0 = &,H = 2,H 2 = 3 
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Starting Estimate 


The governing system is highly nonlinear and its solution requires a good starting estimate. 
Similar to the work by Xu et al [5], we use a solution of infinitely fast combustion [9] as our initial 
guess. In the infinitely fast kinetics, the fuel and the oxidizer are separated by a thin exothermic 
reaction zone. In this zone the fuel and oxidizer are in stoichiometric proportion and the temperature 
and products of combustion are maximized. This infinitely fast reaction solution not only provides 
a good initial guess, but also helps overcome the difficulty of ignition with finite-rate combustion. 


Solution Method 


A fully implicit time-stepping scheme is developed. In the laminar case, the system consists of 
21 equations (if there are 16 species). In the turbulent case there will be 23 equations. They are 
solved in groups: 

(a) pU , pV ,pW and p by solving the mass and momentum equations 

(b) k, e, pt by solving the turbulence model in turbulent combustion case 

(c) h, Y a by solving the energy and species equations 
and, finally, updating 

(d) p,p by calling CHEMIKIN-II 

For the flow part, a line-distribution updating scheme [9,15] is used. To further accelerate the 
convergence, a semi-coarsening multigrid method is developed. Here we only point out the techniques 
we used for our specific applications. In our method, the density and pressure are defined at the cell 
centers and the contravariant velocities are defined at cell interfaces. The density and pressure are 
transferred from finer level by area weighting to coarser grid; the contravariant velocities in coarser 
grid are simply set to the sum of those at corresponding interfaces. The residuals on finer grid are 
restricted to coarser by adding up the corresponding part to the staggered stencils. After relaxation 
is completed on coarser grid, the corrections are fed back to finer grid by bilinear interpolation. 

For the reaction part, the energy equation is solved together with the species equations. An 
implicit alternate line- relaxation method is used for the energy equation. The species equations 
are treated in a fully coupled way. The reaction source terms, which are non-linear and usually 
troublesome, are treated implicitly by linearization. The block-line tridiagonal solver combined with 
vectorized pivoting Gauss elimination is used, which was found very effective to handle the sensitivity 
and stiffness of the system. 

The multigrid method is used only for momentum and continuity equations in this work. The 
other equations, such as energy equation, species equations and re, e equations, are solved on a single 
grid. Therefore, we cannot achieve full multigrid efficiency. However, the whole process for solving 
our system is still substantially accelerated. 


BOUNDARY CONDITIONS 


The boundary type usually encountered can be classified as inflow, outflow, solid wall, symmet- 
rical (slip) and periodical. At the inflow boundary, the flow velocity, enthalpy, and chemical species 
are specified, but the pressure is extrapolated from the interior; then the density is found herefrom 
by using the state equation. 

For the outflow boundary, the back pressure is prescribed and other variables are extrapolated 
from the interior. 

For solid wall boundary, since ghost cell is always introduced, both slip (symmetrical) and non- 
slip conditions can be easily implemented with use of contravariant velocities. Take example of wall 
condition on a j = constant plane. For non-slip condition, reverse reflection is applied to all the 
contravariant velocities associated with the ghost cell. For slip (symmetrical) boundary, the reverse 
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reflection is only applied to V, direct reflection is applied U and V. In both cases, the contravariant 
velocity V lies on this j — constant plane is always set to zero. 

The periodical boundary is the simplest. All the values on ghost cell are taken directly from the 
corresponding cell of other side. 

All the boundary conditions are treated fully implicitly through modification of the implicit 
coefficients of the discretized equations at the boundary points. 


NUMERICAL RESULTS 


This method was applied to calculate the strong swirling combustion in a 3-D gas turbine combustor. 
The computational conditions and grid information are summarized in Table 2. 

Table 2 Strong Swirling Combustion in a 3-D Model Combustor 

Table 2.1 Working Conditions 


Inflow Speed 

Fuel 

Oxidizer 

Species Number 

Reaction Steps 

0.0988(average), 30° swirling angle 

Methane 

Air 

16 

45 (Table 1) 


Table 2.2 Summary of CPU Time and Convergence on Different Grids 


Grid 

Iteration Number 

Convergence 

CPU Time' 

Machine 

Cold Flow 

Fast Reaction 

Finite Rate 

49x21x21 

(21,609) 

10 

30 

120 

5.17 orders 

1.77h 

Cray-YMP 

53x29x29 

(44,573) 

10 

30 

120 

3.61 orders 

3.57h 

u 

Cray-YMP 

49x65x65 

(207,025) 

20 

30 

200 

3.30 orders 

21.3h 

Cray-YMP 


The test case shown here is strong swirling combustion in a 3-D gas turbine combustor. Figure 2 
shows the inlet velocity vectors; the fuel and air enter the combustor coaxially with strong circulation. 
Figure 3 shows the calculated temperature isotherms on the center plane. The velocity vectors are 
plotted in Figure 4. The distributions of main chemical species CH 4 , 02 , CO%, H 2 0 and CO are 
presented in form of isopleths in Figures 5-9. A total of 160 time steps are used for this computation, 
including 10 steps for cold flow, 30 steps for fast reaction and 120 steps for the detailed finite-rate 
reaction. During each iteration step, 2 V-multigrid-cycles are performed for the flow part and 2 
iterations for combustion part. For a 49x65x65 grid of 207,025 grid points, the calculation takes 
only about 200 time steps for finite rate calculation and 21.3 CRAY-YMP hours to reduce residuals 
by three orders of magnitude for all governing equations, demonstrating the high efficiency and 
capability of the present method. 
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1 



Figure 2: Velocity vectors at the inlet Figure 4: Vector plots of the flow field on the 

center ( x , y)-plane (laminar) 




Figure 3: Temperature isotherms on the center 
(a:, y)-plane 



Figure 5: CH\ isopleths (mass fraction) on the 
• center (x, y)-plane 
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CONTOUR LEVELS 



Figure 6: 0 2 isopleths (mass fraction) on the cen- Figure 8: H 2 0 isopleths (mass fraction) on the 
ter (a:,y)-plane center (z, y)-plane 



Figure 7: C0 2 isopleths (mass fraction) on the Figure 9: CO isopleths (mass fraction) on the 
center (x, y)-plane center (a;,y)-plane 
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